[jira] [Created] (IMPALA-13264) bin/coverage_helper.sh should always use gcov from the toolchain

2024-07-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13264:
--

 Summary: bin/coverage_helper.sh should always use gcov from the 
toolchain
 Key: IMPALA-13264
 URL: https://issues.apache.org/jira/browse/IMPALA-13264
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


bin/coverage_helper.sh gets gcov from the toolchain if it is not installed on 
the system.
{noformat}
if ! which gcov > /dev/null; then
  export 
PATH="$PATH:$IMPALA_TOOLCHAIN_PACKAGES_HOME/gcc-$IMPALA_GCC_VERSION/bin"
fi
echo "Using gcov at `which gcov`"{noformat}
Since the toolchain compiler can be different from the system compiler, I think 
it makes more sense to always use gcov from the toolchain's GCC. Then the gcov 
version will always match the GCC version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13264) bin/coverage_helper.sh should always use gcov from the toolchain

2024-07-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13264:
--

 Summary: bin/coverage_helper.sh should always use gcov from the 
toolchain
 Key: IMPALA-13264
 URL: https://issues.apache.org/jira/browse/IMPALA-13264
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


bin/coverage_helper.sh gets gcov from the toolchain if it is not installed on 
the system.
{noformat}
if ! which gcov > /dev/null; then
  export 
PATH="$PATH:$IMPALA_TOOLCHAIN_PACKAGES_HOME/gcc-$IMPALA_GCC_VERSION/bin"
fi
echo "Using gcov at `which gcov`"{noformat}
Since the toolchain compiler can be different from the system compiler, I think 
it makes more sense to always use gcov from the toolchain's GCC. Then the gcov 
version will always match the GCC version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-26 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869039#comment-17869039
 ] 

Joe McDonnell commented on IMPALA-13253:


The AWS LB has an idle time limit of 350 seconds that does not explicitly 
notify either end that the connection is dead: 
[https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-configurable-idle-timeout-for-connection-tracking/]

The libkeepalive library can be used to force a program to use TCP keepalive 
without needing to recompile it: 
[https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html]

Testing using libkeepalive and iptables shows that it behaves as expected: It 
can handle situations where packets are dropped or rejected. In a cluster that 
uses the AWS LB, this can be set to have a keepalive time of 400 seconds to 
very quickly detect and close connections that AWS LB considers idle.

I think keepalive should be on by default.

> Add option to use TCP keepalives for client connections
> ---
>
> Key: IMPALA-13253
> URL: https://issues.apache.org/jira/browse/IMPALA-13253
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> A client can be disconnected without explicitly closing its TCP connection. 
> This can happen if the client machine resets or there is a network 
> disruption. In particular, load balancers can have an idle time that results 
> in a connection becoming invalid. Impala can't really guarantee that the 
> client will properly tear down its connection and the Impala side resources 
> will be released.
> TCP keepalive would allow Impala to detect dead clients and close the 
> connection. It also can prevent a load balancer from seeing the connection as 
> idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13253:
---
Priority: Blocker  (was: Critical)

> Add option to use TCP keepalives for client connections
> ---
>
> Key: IMPALA-13253
> URL: https://issues.apache.org/jira/browse/IMPALA-13253
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> A client can be disconnected without explicitly closing its TCP connection. 
> This can happen if the client machine resets or there is a network 
> disruption. In particular, load balancers can have an idle time that results 
> in a connection becoming invalid. Impala can't really guarantee that the 
> client will properly tear down its connection and the Impala side resources 
> will be released.
> TCP keepalive would allow Impala to detect dead clients and close the 
> connection. It also can prevent a load balancer from seeing the connection as 
> idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13202) Impala workloads can exceed Kudu client's rpc_max_message_size limit

2024-07-24 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868423#comment-17868423
 ] 

Joe McDonnell commented on IMPALA-13202:


I filed https://issues.apache.org/jira/browse/KUDU-3595 for the Kudu-side 
change. This Jira will track the Impala side change to pick up a new Kudu and 
add a startup parameter to set it.

> Impala workloads can exceed Kudu client's rpc_max_message_size limit
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-07-24 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868420#comment-17868420
 ] 

Joe McDonnell commented on IMPALA-13183:


Here is an AWS blog post about how the AWS LB works: 
[https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-configurable-idle-timeout-for-connection-tracking/]

The section about "Scenario #1: TCP connections through AWS Services" explains 
that it doesn't send packets when a connection goes idle. An endpoint would 
only find out when it sends a message. I think this is a problem for Impala, 
and having an idle connection timeout would be one way to avoid issues.

> Add default timeout for hs2/beeswax server sockets
> --
>
> Key: IMPALA-13183
> URL: https://issues.apache.org/jira/browse/IMPALA-13183
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala only sets timeout  for specific operations, for example 
> during SASL handshake and when checking if connection can be closed due to 
> idle session.
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145
> There are several cases where an inactive client could keep the connection 
> open indefinitely, for example if it hasn't opened a session yet.
> I think that there should be a general longer timeout set for both send/recv, 
> e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13202) Impala workloads can exceed Kudu client's rpc_max_message_size limit

2024-07-24 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13202:
---
Summary: Impala workloads can exceed Kudu client's rpc_max_message_size 
limit  (was: KRPC flags used by libkudu_client.so can't be configured)

> Impala workloads can exceed Kudu client's rpc_max_message_size limit
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-07-23 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868178#comment-17868178
 ] 

Joe McDonnell commented on IMPALA-13183:


I was just about to file a Jira about having functionality to close idle 
connections. This sounds similar, so I'm commenting here. We can split it off 
if it is not quite the same.

Basically, there is no current mechanism to close idle connections that have no 
session. There are circumstances where Hue and other clients that use a 
connection pool can create these sessions. For example, Hue might want to close 
a query that was executed by a different connection. It opens a connection 
using the existing session, then when it tries to close the query/session, it 
finds out that the query/session was already closed. This connection ends up 
with no associated session and can stay that way for an indefinite period of 
time.

We have seen cases where these connections can stay open on the server side 
even after the client tries to close it. That seems to be happening with 
certain load balancers, and it can cause the server to run out of fe service 
threads.

> Add default timeout for hs2/beeswax server sockets
> --
>
> Key: IMPALA-13183
> URL: https://issues.apache.org/jira/browse/IMPALA-13183
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala only sets timeout  for specific operations, for example 
> during SASL handshake and when checking if connection can be closed due to 
> idle session.
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145
> There are several cases where an inactive client could keep the connection 
> open indefinitely, for example if it hasn't opened a session yet.
> I think that there should be a general longer timeout set for both send/recv, 
> e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-23 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13253:
--

 Summary: Add option to use TCP keepalives for client connections
 Key: IMPALA-13253
 URL: https://issues.apache.org/jira/browse/IMPALA-13253
 Project: IMPALA
  Issue Type: Task
  Components: Backend, Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A client can be disconnected without explicitly closing its TCP connection. 
This can happen if the client machine resets or there is a network disruption. 
In particular, load balancers can have an idle time that results in a 
connection becoming invalid. Impala can't really guarantee that the client will 
properly tear down its connection and the Impala side resources will be 
released.

TCP keepalive would allow Impala to detect dead clients and close the 
connection. It also can prevent a load balancer from seeing the connection as 
idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-23 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13253:
--

 Summary: Add option to use TCP keepalives for client connections
 Key: IMPALA-13253
 URL: https://issues.apache.org/jira/browse/IMPALA-13253
 Project: IMPALA
  Issue Type: Task
  Components: Backend, Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A client can be disconnected without explicitly closing its TCP connection. 
This can happen if the client machine resets or there is a network disruption. 
In particular, load balancers can have an idle time that results in a 
connection becoming invalid. Impala can't really guarantee that the client will 
properly tear down its connection and the Impala side resources will be 
released.

TCP keepalive would allow Impala to detect dead clients and close the 
connection. It also can prevent a load balancer from seeing the connection as 
idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13230) Add a way to dump stack traces for impala-shell while it is running

2024-07-16 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866577#comment-17866577
 ] 

Joe McDonnell commented on IMPALA-13230:


Example stack trace while running a query:
{noformat}
  File "shell/build/python3_venv/bin/impala-shell", line 8, in 
    sys.exit(impala_shell_main())
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 2305, in impala_shell_main
    shell.cmdloop(intro)
  File "/usr/lib/python3.8/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 788, in onecmd
    return func(arg)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 1239, in do_select
    return self._execute_stmt(query_str, print_web_link=True)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 1426, in _execute_stmt
    for rows in rows_fetched:
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 926, in fetch
    resp = self._do_hs2_rpc(FetchResults, req)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 1148, in _do_hs2_rpc
    rpc_output = rpc(rpc_input)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 920, in FetchResults
    return self.imp_service.FetchResults(req)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/TCLIService/TCLIService.py",
 line 756, in FetchResults
    return self.recv_FetchResults()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/TCLIService/TCLIService.py",
 line 768, in recv_FetchResults
    (fname, mtype, rseqid) = iprot.readMessageBegin()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 134, in readMessageBegin
    sz = self.readI32()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 217, in readI32
    buff = self.trans.readAll(4)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TTransport.py",
 line 62, in readAll
    chunk = self.read(sz - have)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TTransport.py",
 line 164, in read
    self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TSocket.py",
 line 150, in read
    buff = self.handle.recv(sz)
{noformat}

> Add a way to dump stack traces for impala-shell while it is running
> ---
>
> Key: IMPALA-13230
> URL: https://issues.apache.org/jira/browse/IMPALA-13230
> Project: IMPALA
>  Issue Type: Task
>  Components: Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> It can be useful to get the Python stack traces for impala-shell when it is 
> stuck. There is a nice thread on Stack Overflow about how to do this: 
> [https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application]
> One option is to install a signal handler for the SIGUSR1 signal and use that 
> to dump a backtrace. I tried this and it works for Python 3 (but causes 
> issues for running queries on Python 2):
> {noformat}
>     # For debugging, it is useful to handle the SIGUSR1 symbol and use it to 
> print a
>     # stacktrace
>     signal.signal(signal.SIGUSR1, lambda sid, stack: 
> traceback.print_stack(stack)){noformat}
> Another option mentioned is the faulthandler module 
> ([https://docs.python.org/dev/library/faulthandler.html|https://docs.python.org/dev/library/faulthandler.html)]
>  ), which provides a way to do the same thing. The faulthandler module seems 
> to be able to do this for all threads, not just the main thread.
> Either way, this would give us some options if we need to debug impala-shell 
> out in the wild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Created] (IMPALA-13230) Add a way to dump stack traces for impala-shell while it is running

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13230:
--

 Summary: Add a way to dump stack traces for impala-shell while it 
is running
 Key: IMPALA-13230
 URL: https://issues.apache.org/jira/browse/IMPALA-13230
 Project: IMPALA
  Issue Type: Task
  Components: Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


It can be useful to get the Python stack traces for impala-shell when it is 
stuck. There is a nice thread on Stack Overflow about how to do this: 
[https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application]

One option is to install a signal handler for the SIGUSR1 signal and use that 
to dump a backtrace. I tried this and it works for Python 3 (but causes issues 
for running queries on Python 2):
{noformat}
    # For debugging, it is useful to handle the SIGUSR1 symbol and use it to 
print a
    # stacktrace
    signal.signal(signal.SIGUSR1, lambda sid, stack: 
traceback.print_stack(stack)){noformat}
Another option mentioned is the faulthandler module 
([https://docs.python.org/dev/library/faulthandler.html|https://docs.python.org/dev/library/faulthandler.html)]
 ), which provides a way to do the same thing. The faulthandler module seems to 
be able to do this for all threads, not just the main thread.

Either way, this would give us some options if we need to debug impala-shell 
out in the wild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13230) Add a way to dump stack traces for impala-shell while it is running

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13230:
--

 Summary: Add a way to dump stack traces for impala-shell while it 
is running
 Key: IMPALA-13230
 URL: https://issues.apache.org/jira/browse/IMPALA-13230
 Project: IMPALA
  Issue Type: Task
  Components: Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


It can be useful to get the Python stack traces for impala-shell when it is 
stuck. There is a nice thread on Stack Overflow about how to do this: 
[https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application]

One option is to install a signal handler for the SIGUSR1 signal and use that 
to dump a backtrace. I tried this and it works for Python 3 (but causes issues 
for running queries on Python 2):
{noformat}
    # For debugging, it is useful to handle the SIGUSR1 symbol and use it to 
print a
    # stacktrace
    signal.signal(signal.SIGUSR1, lambda sid, stack: 
traceback.print_stack(stack)){noformat}
Another option mentioned is the faulthandler module 
([https://docs.python.org/dev/library/faulthandler.html|https://docs.python.org/dev/library/faulthandler.html)]
 ), which provides a way to do the same thing. The faulthandler module seems to 
be able to do this for all threads, not just the main thread.

Either way, this would give us some options if we need to debug impala-shell 
out in the wild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13229) Improve logging for TAcceptQueueServer when a thread takes a long time in SASL negotiation

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13229:
--

 Summary: Improve logging for TAcceptQueueServer when a thread 
takes a long time in SASL negotiation
 Key: IMPALA-13229
 URL: https://issues.apache.org/jira/browse/IMPALA-13229
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


In IMPALA-11653, we are concerned about bad clients that use up threads in the 
SASL negotiation thread pool for long periods of time (or eventually hit 
sasl_connect_tcp_timeout_ms).

As a separate task, it would be useful to be able to quickly tell from the logs 
whether a connection spends a lot of time in the SASL negotiation and could be 
creating this type of problem.

We should add some logging to make this issue clear from the logs. One option 
is to log a warning if SASL negotiation takes longer than some threshold (and 
thus was using up a thread during that time). If SASL negotiation is taking 
longer than a few seconds, that can be a real issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13229) Improve logging for TAcceptQueueServer when a thread takes a long time in SASL negotiation

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13229:
--

 Summary: Improve logging for TAcceptQueueServer when a thread 
takes a long time in SASL negotiation
 Key: IMPALA-13229
 URL: https://issues.apache.org/jira/browse/IMPALA-13229
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


In IMPALA-11653, we are concerned about bad clients that use up threads in the 
SASL negotiation thread pool for long periods of time (or eventually hit 
sasl_connect_tcp_timeout_ms).

As a separate task, it would be useful to be able to quickly tell from the logs 
whether a connection spends a lot of time in the SASL negotiation and could be 
creating this type of problem.

We should add some logging to make this issue clear from the logs. One option 
is to log a warning if SASL negotiation takes longer than some threshold (and 
thus was using up a thread during that time). If SASL negotiation is taking 
longer than a few seconds, that can be a real issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured

2024-07-15 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866157#comment-17866157
 ] 

Joe McDonnell commented on IMPALA-13202:


It seems like one path would be for Kudu client to add this as a configuration 
in KuduClientBuilder and then Impala could specify the value there. That is how 
we usually pass in configuration parameters for the Kudu client. See 
[https://github.com/apache/impala/blob/master/be/src/exec/kudu/kudu-util.cc#L85-L104]
 . I think it is good to have these things as part of the client API rather 
than setting global variables. I think it is good that Kudu client's flags are 
hidden and can't be set.

My understanding is that Impala's rpc_max_message_size parameter was intended 
to apply for Impala to Impala communication, not Impala to Kudu communication.

> KRPC flags used by libkudu_client.so can't be configured
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12906) Incorporate run time scan range information into the tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12906 started by Joe McDonnell.
--
> Incorporate run time scan range information into the tuple cache key
> 
>
> Key: IMPALA-12906
> URL: https://issues.apache.org/jira/browse/IMPALA-12906
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The cache key for tuple caching currently doesn't incorporate information 
> about the scan ranges for the tables that it scans. This is important for 
> detecting changes in the table and having different cache keys for different 
> fragment instances that are assigned different scan ranges.
> To make this deterministic for mt_dop, we need mt_dop to assign scan ranges 
> deterministically to individual fragment instances rather than using the 
> shared queue introduced in IMPALA-9655.
> One way to implement this is to collect information about the scan nodes that 
> feed into the tuple cache and pass that information over to the tuple cache 
> node. At runtime, it can hash the scan ranges assigned to those scan nodes 
> and incorporate that into the cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12906) Incorporate run time scan range information into the tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12906:
--

Assignee: Joe McDonnell

> Incorporate run time scan range information into the tuple cache key
> 
>
> Key: IMPALA-12906
> URL: https://issues.apache.org/jira/browse/IMPALA-12906
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The cache key for tuple caching currently doesn't incorporate information 
> about the scan ranges for the tables that it scans. This is important for 
> detecting changes in the table and having different cache keys for different 
> fragment instances that are assigned different scan ranges.
> To make this deterministic for mt_dop, we need mt_dop to assign scan ranges 
> deterministically to individual fragment instances rather than using the 
> shared queue introduced in IMPALA-9655.
> One way to implement this is to collect information about the scan nodes that 
> feed into the tuple cache and pass that information over to the tuple cache 
> node. At runtime, it can hash the scan ranges assigned to those scan nodes 
> and incorporate that into the cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12817) Introduce basic intermediate result caching to speed similar queries

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12817:
--

Assignee: Joe McDonnell

> Introduce basic intermediate result caching to speed similar queries
> 
>
> Key: IMPALA-12817
> URL: https://issues.apache.org/jira/browse/IMPALA-12817
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> This tracks the first phase of intermediate result caching.
> The goals of the initial phase are to introduce a basic framework for caching 
> tuples at various points in the plan. The first location that needs to work 
> is immediately above an HdfsScanNode. Caching will use a local SSD to store 
> the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13188) Add test that compute stats does not result in a different tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13188:
--

 Summary: Add test that compute stats does not result in a 
different tuple cache key
 Key: IMPALA-13188
 URL: https://issues.apache.org/jira/browse/IMPALA-13188
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If someone runs "compute stats" on the underlying tables for a query, the tuple 
cache key should only change if the plan actually changes. The resource 
estimates should not be incorporated into the tuple cache key as they have no 
semantic impact. The code already excludes the resource estimates from the key 
for the PlanNode, but we should have tests for computing stats and verifying 
that the key doesn't change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13188) Add test that compute stats does not result in a different tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13188:
--

 Summary: Add test that compute stats does not result in a 
different tuple cache key
 Key: IMPALA-13188
 URL: https://issues.apache.org/jira/browse/IMPALA-13188
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If someone runs "compute stats" on the underlying tables for a query, the tuple 
cache key should only change if the plan actually changes. The resource 
estimates should not be incorporated into the tuple cache key as they have no 
semantic impact. The code already excludes the resource estimates from the key 
for the PlanNode, but we should have tests for computing stats and verifying 
that the key doesn't change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13186) Tuple cache keys should incorporate information about related query options

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13186:
--

 Summary: Tuple cache keys should incorporate information about 
related query options
 Key: IMPALA-13186
 URL: https://issues.apache.org/jira/browse/IMPALA-13186
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Currently, the tuple cache key does not include information from the query 
options. Many query options have no impact on the result of a query (e.g. 
idle_session_timeout) or are evaluated purely on the coordinator during 
planning (e.g. broadcast_bytes_limit). 

However, some query options can impact behavior either by controlling how 
certain things are calculated (e.g. decimal_v2) or controlling what conditions 
result in an error. Changing a query option can change the output of a query.

We need some way to incorporate the relevant query options into the tuple cache 
key so there is no correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13186) Tuple cache keys should incorporate information about related query options

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13186:
--

 Summary: Tuple cache keys should incorporate information about 
related query options
 Key: IMPALA-13186
 URL: https://issues.apache.org/jira/browse/IMPALA-13186
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Currently, the tuple cache key does not include information from the query 
options. Many query options have no impact on the result of a query (e.g. 
idle_session_timeout) or are evaluated purely on the coordinator during 
planning (e.g. broadcast_bytes_limit). 

However, some query options can impact behavior either by controlling how 
certain things are calculated (e.g. decimal_v2) or controlling what conditions 
result in an error. Changing a query option can change the output of a query.

We need some way to incorporate the relevant query options into the tuple cache 
key so there is no correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13185:
--

 Summary: Tuple cache keys need to incorporate runtime filter 
information
 Key: IMPALA-13185
 URL: https://issues.apache.org/jira/browse/IMPALA-13185
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If a runtime filter impacts the results of a fragment, then the tuple cache key 
needs to incorporate information about the generation of that runtime filter. 
This needs to include information about the base tables that impact the runtime 
filter.

For example, suppose there is a join. The build side of the join produces a 
runtime filter that gets delivered to the probe side of the join. The tuple 
cache key for the probe side of the join will need to include a representation 
of the runtime filter. If the table on the build side of the join changes, the 
tuple cache key for the probe side needs to change due to the possible 
difference in the runtime filter.

This can also impact eligibility. In theory, the build side of a join could be 
constructed from a source with a limit specified, and this can result in 
non-determinism. Since the build of the runtime filter is not deterministic, 
the consumer of the runtime filter is not deterministic and can't participate 
in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13185:
--

 Summary: Tuple cache keys need to incorporate runtime filter 
information
 Key: IMPALA-13185
 URL: https://issues.apache.org/jira/browse/IMPALA-13185
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If a runtime filter impacts the results of a fragment, then the tuple cache key 
needs to incorporate information about the generation of that runtime filter. 
This needs to include information about the base tables that impact the runtime 
filter.

For example, suppose there is a join. The build side of the join produces a 
runtime filter that gets delivered to the probe side of the join. The tuple 
cache key for the probe side of the join will need to include a representation 
of the runtime filter. If the table on the build side of the join changes, the 
tuple cache key for the probe side needs to change due to the possible 
difference in the runtime filter.

This can also impact eligibility. In theory, the build side of a join could be 
constructed from a source with a limit specified, and this can result in 
non-determinism. Since the build of the runtime filter is not deterministic, 
the consumer of the runtime filter is not deterministic and can't participate 
in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13181) Disable tuple caching for locations that have a limit

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13181:
--

 Summary: Disable tuple caching for locations that have a limit
 Key: IMPALA-13181
 URL: https://issues.apache.org/jira/browse/IMPALA-13181
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Statements that use a limit are non-deterministic unless there is a sort. 
Locations with limits should be marked ineligible for tuple caching.

As an example, for a hash join, suppose the build side has a limit. This means 
that the build side could vary from run to run. A requirement for our 
correctness is that all nodes agree on the contents of the build side. The 
variability of the limit is a problem for the build side, because if one node 
hits the cache and another does not, there is no guarantee that they agree on 
the contents of the build side.

Concrete example: 
{noformat}
select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 
10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
There are times when limits are deterministic or the non-determinism is 
harmless. It is safer to ban in completely at first. In a future change, this 
rule can be relaxed to allow caching in those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13181) Disable tuple caching for locations that have a limit

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13181:
--

 Summary: Disable tuple caching for locations that have a limit
 Key: IMPALA-13181
 URL: https://issues.apache.org/jira/browse/IMPALA-13181
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Statements that use a limit are non-deterministic unless there is a sort. 
Locations with limits should be marked ineligible for tuple caching.

As an example, for a hash join, suppose the build side has a limit. This means 
that the build side could vary from run to run. A requirement for our 
correctness is that all nodes agree on the contents of the build side. The 
variability of the limit is a problem for the build side, because if one node 
hits the cache and another does not, there is no guarantee that they agree on 
the contents of the build side.

Concrete example: 
{noformat}
select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 
10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
There are times when limits are deterministic or the non-determinism is 
harmless. It is safer to ban in completely at first. In a future change, this 
rule can be relaxed to allow caching in those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13179) Disable tuple caching when using non-deterministic functions

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13179:
--

 Summary: Disable tuple caching when using non-deterministic 
functions
 Key: IMPALA-13179
 URL: https://issues.apache.org/jira/browse/IMPALA-13179
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Some functions are non-deterministic, so tuple caching needs to detect those 
functions and avoid caching at locations that are non-deterministic.

There are two different pieces:
 # Correctness: If the key is constant but the results can be variable, then 
that is a correctness issue. That can happen for genuinely random functions 
like uuid(). It can happen when timestamp functions like now() are evaluated at 
runtime.
 # Performance: The frontend does constant-folding of functions that don't vary 
during executions, so something like now() might be replaced by a hard-coded 
integer. This means that the key contains something that varies frequently. 
That can be a performance issue, because we can be caching things that cannot 
be reused. This doesn't have the same correctness issue.

This ticket is focused on correctness piece. If uuid()/now()/etc are referenced 
and would be evaluated at runtime, the location should be ineligible for tuple 
caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13179) Disable tuple caching when using non-deterministic functions

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13179:
--

 Summary: Disable tuple caching when using non-deterministic 
functions
 Key: IMPALA-13179
 URL: https://issues.apache.org/jira/browse/IMPALA-13179
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Some functions are non-deterministic, so tuple caching needs to detect those 
functions and avoid caching at locations that are non-deterministic.

There are two different pieces:
 # Correctness: If the key is constant but the results can be variable, then 
that is a correctness issue. That can happen for genuinely random functions 
like uuid(). It can happen when timestamp functions like now() are evaluated at 
runtime.
 # Performance: The frontend does constant-folding of functions that don't vary 
during executions, so something like now() might be replaced by a hard-coded 
integer. This means that the key contains something that varies frequently. 
That can be a performance issue, because we can be caching things that cannot 
be reused. This doesn't have the same correctness issue.

This ticket is focused on correctness piece. If uuid()/now()/etc are referenced 
and would be evaluated at runtime, the location should be ineligible for tuple 
caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12541) Compile toolchain GCC with --enable-linker-build-id to add Build ID to binaries

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12541.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Compile toolchain GCC with --enable-linker-build-id to add Build ID to 
> binaries
> ---
>
> Key: IMPALA-12541
> URL: https://issues.apache.org/jira/browse/IMPALA-12541
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> A "Build ID" is a unique identifier for binaries (which is a hash of the 
> contents). Producing OS packages with separate debug symbols requires each 
> binary to have a Build ID. This is particularly important for libstdc++, 
> because it is produced during the native-toolchain build rather than the 
> regular Impala build. To turn on Build IDs, one can configure that at GCC 
> build time by specifying "--enable-linker-build-id". This causes GCC to tell 
> the linker to compute the Build ID.
> Breakpad will also use the Build ID when resolving symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12541) Compile toolchain GCC with --enable-linker-build-id to add Build ID to binaries

2024-06-25 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859971#comment-17859971
 ] 

Joe McDonnell commented on IMPALA-12541:


{noformat}
commit e78b0ef34241218cda7eac3b526cb6a824596df1
Author: Joe McDonnell 
Date:   Fri Nov 3 14:18:47 2023 -0700    IMPALA-12541: Build GCC with 
--enable-linker-build-id
    
    This builds GCC with --enable-linker-build-id so that
    binaries have Build ID specified. Build ID is needed to
    produce OS packages with separate debuginfo. This is
    particularly important for libstdc++, because it is
    not built as part of the regular Impala build.
    
    Testing:
     - Verified that resulting binaries have .note.gnu.build-id
    
    Change-Id: Ieb2017ba1a348a9e9e549fa3268635afa94ae6d0
    Reviewed-on: http://gerrit.cloudera.org:8080/21469
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Compile toolchain GCC with --enable-linker-build-id to add Build ID to 
> binaries
> ---
>
> Key: IMPALA-12541
> URL: https://issues.apache.org/jira/browse/IMPALA-12541
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> A "Build ID" is a unique identifier for binaries (which is a hash of the 
> contents). Producing OS packages with separate debug symbols requires each 
> binary to have a Build ID. This is particularly important for libstdc++, 
> because it is produced during the native-toolchain build rather than the 
> regular Impala build. To turn on Build IDs, one can configure that at GCC 
> build time by specifying "--enable-linker-build-id". This causes GCC to tell 
> the linker to compute the Build ID.
> Breakpad will also use the Build ID when resolving symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12541) Compile toolchain GCC with --enable-linker-build-id to add Build ID to binaries

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12541.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Compile toolchain GCC with --enable-linker-build-id to add Build ID to 
> binaries
> ---
>
> Key: IMPALA-12541
> URL: https://issues.apache.org/jira/browse/IMPALA-12541
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> A "Build ID" is a unique identifier for binaries (which is a hash of the 
> contents). Producing OS packages with separate debug symbols requires each 
> binary to have a Build ID. This is particularly important for libstdc++, 
> because it is produced during the native-toolchain build rather than the 
> regular Impala build. To turn on Build IDs, one can configure that at GCC 
> build time by specifying "--enable-linker-build-id". This causes GCC to tell 
> the linker to compute the Build ID.
> Breakpad will also use the Build ID when resolving symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-06-25 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859962#comment-17859962
 ] 

Joe McDonnell commented on IMPALA-13121:


{noformat}
commit b9167e985c69fd321e9e25e5ae0c7747682f06f6
Author: Joe McDonnell 
Date:   Fri May 31 15:20:20 2024 -0700    IMPALA-13121: Switch to ccache 3.7.12
    
    The docker images currently build and use ccache 3.3.3.
    Recently, we ran into a case where debuginfo was being
    generated even though the cflags ended with -g0. The
    ccache release history has this note for 3.3.5:
     - Fixed a regression where the original order of
       debug options could be lost.
    
    This upgrades ccache to 3.7.12 to address this issue.
    
    Ccache 3.7.12 is the last ccache release that builds
    using autotools. Ccache 4 moves to build with CMake.
    Adding a CMake dependency would be complicated at this
    stage, because some of the older OSes don't provide a
    new enough CMake in the package repositories. Since we
    don't really need the new features of Ccache 4+, this
    sticks with 3.7.12 for now.
    
    This reenables the check_ccache_works() logic in
    assert-dependencies-present.py.
    
    Testing:
     - Built docker images and ran a toolchain build
     - The newer ccache resolves the unexpected debuginfo issue
    
    Change-Id: I90d751445daa0dc298b634c1049d637a14afac40
    Reviewed-on: http://gerrit.cloudera.org:8080/21473
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13121.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13121.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13146.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13146.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13136) Refactor AnalyzedFunctionCallExpr

2024-06-12 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854526#comment-17854526
 ] 

Joe McDonnell commented on IMPALA-13136:


[~scarlin] I'm ok with punting on this for a while. We have a long list of 
things that need to land, and this is more about code cleanliness than 
functionality.

> Refactor AnalyzedFunctionCallExpr
> -
>
> Key: IMPALA-13136
> URL: https://issues.apache.org/jira/browse/IMPALA-13136
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Steve Carlin
>Priority: Major
>
> Copied from code review:
> The part where we immediately analyze as part of the constructor makes for 
> complicated exception handling. RexVisitor doesn't support exceptions, so it 
> adds complication to handle them under those circumstances. I can't really 
> explain why it is necessary.
> Let me sketch out an alternative:
> 1. Construct the whole Expr tree without analyzing it
> 2. Any errors that happen during this process are not usually actionable by 
> the end user. It's good to have a descriptive error message, but it doesn't 
> mean there is something wrong with the SQL. I think that it is ok for this 
> code to throw subclasses of RuntimeException or use 
> Preconditions.checkState() with a good explanation.
> 3. When we get the Expr tree back in CreateExprVisitor::getExpr(), we call 
> analyze() on the root node, which does a recursive analysis of the whole tree.
> 4. The special Expr classes don't run analyze() in the constructor, don't 
> keep a reference to the Analyzer, and don't override resetAnalysisState(). 
> They override analyzeImpl() and they should be idempotent. The clone 
> constructor should not need to do anything special, just do a deep copy.
> I don't want to bog down this review. If we want to address this as a 
> followup, I can live with that, but I don't want us to go too far down this 
> road. (Or if we have a good explanation for why it is necessary, then we can 
> write a good comment and move on.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13151:
--

Assignee: Michael Smith

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13146:
--

Assignee: Joe McDonnell

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12967) Testcase fails at test_migrated_table_field_id_resolution due to "Table does not exist"

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853224#comment-17853224
 ] 

Joe McDonnell commented on IMPALA-12967:


There is a separate symptom where this test fails with a Disk I/O error. It is 
probably somewhat related, so we need to decide whether to include that symptom 
here. See IMPALA-13144.

> Testcase fails at test_migrated_table_field_id_resolution due to "Table does 
> not exist"
> ---
>
> Key: IMPALA-12967
> URL: https://issues.apache.org/jira/browse/IMPALA-12967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> Testcase test_migrated_table_field_id_resolution fails at exhaustive release 
> build with following messages:
> *Regression*
> {code:java}
> query_test.test_iceberg.TestIcebergTable.test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
>  "iceberg_migrated_alter_test_orc", "orc") common/file_utils.py:68: in 
> create_iceberg_table_from_directory file_format)) 
> common/impala_connection.py:215: in execute 
> fetch_profile_after_close=fetch_profile_after_close) 
> beeswax/impala_beeswax.py:191: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:384: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:  E   CAUSED BY: IcebergTableLoadingException: Table does not exist 
> at location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> Stacktrace
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test_orc", "orc")
> common/file_utils.py:68: in create_iceberg_table_from_directory
> file_format))
> common/impala_connection.py:215: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:ImpalaRuntimeException: Error making 'createTable' RPC to 
> Hive Metastore: 
> E   CAUSED BY: IcebergTableLoadingException: Table does not exist at 
> location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> {code}
> *Standard Error*
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_b59d79db` 
> CASCADE;
> -- 2024-04-02 00:56:55,137 INFO MainThread: Started query 
> f34399a8b7cddd67:031a3b96
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_b59d79db`;
> -- 2024-04-02 00:56:57,302 INFO MainThread: Started query 
> 94465af69907eac5:e33f17e0
> -- 2024-04-02 00:56:57,353 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_b59d79db" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up 

[jira] [Commented] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853223#comment-17853223
 ] 

Joe McDonnell commented on IMPALA-13144:


We need to decide whether we want to track this with IMPALA-12967 (which was 
originally about "Table does not exist at location" on the same test) or keep 
it separate.

> TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O 
> error
> --
>
> Key: IMPALA-13144
> URL: https://issues.apache.org/jira/browse/IMPALA-13144
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> A couple test jobs hit a failure on 
> TestIcebergTable.test_migrated_table_field_id_resolution:
> {noformat}
> query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
> vector, unique_database)
> common/impala_test_suite.py:725: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:216: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
> open HDFS file 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
> E   Error(2): No such file or directory
> E   Root cause: RemoteException: File does not exist: 
> /test-warehouse/iceberg_migrated_alter_test/00_0
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> E at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
> E at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> E at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13139:
---
Description: 
When debugging TestRestart, I noticed that the debug_action set for one query 
stayed in effect for subsequent queries that didn't specify query_options.
{noformat}
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

...

# debug_action is still set for these queries:
    self.execute_query_expect_success(self.client, "select age from 
{}".format(tbl_name))
self.execute_query_expect_success(self.client,
        "alter table {} add columns (name string)".format(tbl_name))
    self.execute_query_expect_success(self.client, "select name from 
{}".format(tbl_name)){noformat}
There is a way to clear the query options (self.client.clear_configuration()), 
but this is an odd behavior. It's unclear if some tests rely on this behavior.

> Query options set via ImpalaTestSuite::execute_query_expect_success stay set 
> for subsequent queries
> ---
>
> Key: IMPALA-13139
> URL: https://issues.apache.org/jira/browse/IMPALA-13139
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> When debugging TestRestart, I noticed that the debug_action set for one query 
> stayed in effect for subsequent queries that didn't specify query_options.
> {noformat}
>     DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
>                     .format(debug_action_sleep_time_sec * 1000))
>     query = "alter table {} add columns (age int)".format(tbl_name)
>     handle = self.execute_query_async(query, query_options={"debug_action": 
> DEBUG_ACTION})
> ...
> # debug_action is still set for these queries:
>     self.execute_query_expect_success(self.client, "select age from 
> {}".format(tbl_name))
> self.execute_query_expect_success(self.client,
>         "alter table {} add columns (name string)".format(tbl_name))
>     self.execute_query_expect_success(self.client, "select name from 
> {}".format(tbl_name)){noformat}
> There is a way to clear the query options 
> (self.client.clear_configuration()), but this is an odd behavior. It's 
> unclear if some tests rely on this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13139:
--

 Summary: Query options set via 
ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
 Key: IMPALA-13139
 URL: https://issues.apache.org/jira/browse/IMPALA-13139
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13139:
--

 Summary: Query options set via 
ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
 Key: IMPALA-13139
 URL: https://issues.apache.org/jira/browse/IMPALA-13139
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-06 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852961#comment-17852961
 ] 

Joe McDonnell commented on IMPALA-12616:


This is looking timing-related. I was able to get this to pass by adjusting 
some of the sleep times. Basically, it looks like the catalog is slower on s3 
and some operations don't finish in the time we thought they would.

 
{noformat}
    debug_action_sleep_time_sec = 10 (NEW: 30)
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

    # Wait a bit so the RPC from the catalogd arrives to the coordinator.
    time.sleep(0.5) (NEW: 5)

    self.cluster.catalogd.restart()

    # Wait for the query to finish.
    max_wait_time = (debug_action_sleep_time_sec
        + self.WAIT_FOR_CATALOG_UPDATE_TIMEOUT_SEC + 10)
    self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
max_wait_time){noformat}
A successful timeline looks like this:

 
 # Submit an alter table that sleeps before processing the catalog update
 # Sleep a little bit so the catalog knows about the alter table
 # Restart the catalogd
 # The catalog sends an update via the statestore. This has the new catalog ID 
and causes this message: "There was an error processing the impalad catalog 
update. Requesting a full topic update to recover: CatalogException: Detected 
catalog service ID changes from 9c9f7ff13f0e4f72:a896bee4d52fd37e to 
da67610b2c304198:a05daf1bc3d6a4b3. Aborting updateCatalog()"
 # The catalogd sends a full topic update
 # The alter table wakes up and prints this message: Catalog service ID 
mismatch. Current ID: da67610b2c304198:a05daf1bc3d6a4b3. ID in response: 
9c9f7ff13f0e4f72:a896bee4d52fd37e. Catalogd may have been restarted. Waiting 
for new catalog update from statestore.
 # Either it times out or there are too many non-empty updates, and the alter 
table bails out with "W0506 22:42:10.316627 23066 impala-server.cc:2369] 
e14b23a22458ab75:6b269414] Ignoring catalog update result of catalog 
service ID 9c9f7ff13f0e4f72:a896bee4d52fd37e because it does not match with 
current catalog service ID da67610b2c304198:a05daf1bc3d6a4b3. The current 
catalog service ID may be stale (this may be caused by the catalogd having been 
restarted more than once) or newer than the catalog service ID of the update 
result."

If the alter table wakes up from its sleep before #5 happens, the alter table 
will see the catalog service ID change and fail. To avoid that, we adjust the 
WAIT_BEFORE_PROCESSING_CATALOG_UPDATE higher. I also lengthened the sleep in #2 
to give the initial catalog some extra time to hear about the alter table. The 
test verifies that the logs contain the expected messages, so this should be a 
safe modification to the test.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13132) Ozone jobs see intermittent termination of Ozone manager / HMS fails to start

2024-06-04 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13132:
--

 Summary: Ozone jobs see intermittent termination of Ozone manager 
/ HMS fails to start
 Key: IMPALA-13132
 URL: https://issues.apache.org/jira/browse/IMPALA-13132
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Ozone jobs load data/metadata snapshots during dataload, then restarts the 
cluster. On this restart, the HMS sometimes fails to come up:
{noformat}
16:04:13  --> Starting Hive Metastore Service
16:04:13 No handlers could be found for logger "thrift.transport.TSocket"
16:04:14 Waiting for the Metastore at localhost:9083...
...
16:09:14 Waiting for the Metastore at localhost:9083...
16:09:14 Metastore service failed to start within 300.0 seconds.{noformat}
In the metastore logs, we see messages like this:
{noformat}
2024-06-04T08:37:06,425  INFO [main] retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
hostname/127.0.0.1 to localhost:9862 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
$Proxy31.submitRequest over nodeId=null,nodeAddress=localhost:9862 after 1 
failover attempts. Trying to failover after sleeping for 4000ms.{noformat}
It's trying to talk to the Ozone manager. The Ozone cluster was back up and 
running before trying to start the HMS, but then the Ozone manager received a 
signal and shutdown:
{noformat}
24/06/04 08:36:37 ERROR om.OzoneManagerStarter: RECEIVED SIGNAL 15: SIGTERM
24/06/04 08:36:37 INFO om.OzoneManagerStarter: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down OzoneManager at hostname/127.0.0.1
/
24/06/04 08:36:37 INFO om.OzoneManager: om1[localhost:9862]: Stopping Ozone 
Manager{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13132) Ozone jobs see intermittent termination of Ozone manager / HMS fails to start

2024-06-04 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13132:
--

 Summary: Ozone jobs see intermittent termination of Ozone manager 
/ HMS fails to start
 Key: IMPALA-13132
 URL: https://issues.apache.org/jira/browse/IMPALA-13132
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Ozone jobs load data/metadata snapshots during dataload, then restarts the 
cluster. On this restart, the HMS sometimes fails to come up:
{noformat}
16:04:13  --> Starting Hive Metastore Service
16:04:13 No handlers could be found for logger "thrift.transport.TSocket"
16:04:14 Waiting for the Metastore at localhost:9083...
...
16:09:14 Waiting for the Metastore at localhost:9083...
16:09:14 Metastore service failed to start within 300.0 seconds.{noformat}
In the metastore logs, we see messages like this:
{noformat}
2024-06-04T08:37:06,425  INFO [main] retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
hostname/127.0.0.1 to localhost:9862 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
$Proxy31.submitRequest over nodeId=null,nodeAddress=localhost:9862 after 1 
failover attempts. Trying to failover after sleeping for 4000ms.{noformat}
It's trying to talk to the Ozone manager. The Ozone cluster was back up and 
running before trying to start the HMS, but then the Ozone manager received a 
signal and shutdown:
{noformat}
24/06/04 08:36:37 ERROR om.OzoneManagerStarter: RECEIVED SIGNAL 15: SIGTERM
24/06/04 08:36:37 INFO om.OzoneManagerStarter: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down OzoneManager at hostname/127.0.0.1
/
24/06/04 08:36:37 INFO om.OzoneManager: om1[localhost:9862]: Stopping Ozone 
Manager{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-04 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851904#comment-17851904
 ] 

Joe McDonnell commented on IMPALA-12616:


I switched the code to use self.client.wait_for_finished_timeout(), which will 
stop if it reaches either FINISHED or EXCEPTION. Here is the error it hits:
{noformat}
custom_cluster/test_restart_services.py:238: in 
test_restart_catalogd_while_handling_rpc_response_with_timeout
finished = self.client.wait_for_finished_timeout(handle, max_wait_time)
common/impala_connection.py:247: in wait_for_finished_timeout
operation_handle.get_handle(), timeout)
beeswax/impala_beeswax.py:423: in wait_for_finished_timeout
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:CatalogException: Detected catalog service ID changes from 
b0019607521f4f0a:8340b9882af1a856 to a4f8584219b34182:9b3cf9af859a0d54. 
Aborting updateCatalog(){noformat}

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-03 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851840#comment-17851840
 ] 

Joe McDonnell commented on IMPALA-12616:


This is now failing pretty consistently on a variety of s3 jobs (but only s3 
jobs). I think the first thing we could do is modify wait_for_any_state() to 
detect the terminal state (EXCEPTION) and print the error. In general, it would 
be good for wait_for_state() to know about terminal states.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851831#comment-17851831
 ] 

Joe McDonnell commented on IMPALA-13128:


It looks intermittent, so adding "flaky" label

> disk-file-test hangs on ARM + UBSAN test jobs
> -
>
> Key: IMPALA-13128
> URL: https://issues.apache.org/jira/browse/IMPALA-13128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
> this being the last output:
> {noformat}
> 23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   
> 43.42 sec
> 23:07:30 Start  64: disk-file-test
> 23:07:30 
> 18:47:00 
> 18:47:00  run-all-tests.sh TIMED OUT! {noformat}
> This has happened multiple times, but it looks limited to ARM + UBSAN. The 
> jobs take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13128:
--

 Summary: disk-file-test hangs on ARM + UBSAN test jobs
 Key: IMPALA-13128
 URL: https://issues.apache.org/jira/browse/IMPALA-13128
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
this being the last output:
{noformat}
23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   43.42 
sec
23:07:30 Start  64: disk-file-test
23:07:30 
18:47:00 
18:47:00  run-all-tests.sh TIMED OUT! {noformat}
This has happened multiple times, but it looks limited to ARM + UBSAN. The jobs 
take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13128:
--

 Summary: disk-file-test hangs on ARM + UBSAN test jobs
 Key: IMPALA-13128
 URL: https://issues.apache.org/jira/browse/IMPALA-13128
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
this being the last output:
{noformat}
23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   43.42 
sec
23:07:30 Start  64: disk-file-test
23:07:30 
18:47:00 
18:47:00  run-all-tests.sh TIMED OUT! {noformat}
This has happened multiple times, but it looks limited to ARM + UBSAN. The jobs 
take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13128:
---
Labels: broken-build flaky  (was: broken-build)

> disk-file-test hangs on ARM + UBSAN test jobs
> -
>
> Key: IMPALA-13128
> URL: https://issues.apache.org/jira/browse/IMPALA-13128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
> this being the last output:
> {noformat}
> 23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   
> 43.42 sec
> 23:07:30 Start  64: disk-file-test
> 23:07:30 
> 18:47:00 
> 18:47:00  run-all-tests.sh TIMED OUT! {noformat}
> This has happened multiple times, but it looks limited to ARM + UBSAN. The 
> jobs take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13127.

Fix Version/s: Not Applicable
   Resolution: Duplicate

Fixed by followup change in IMPALA-13040, closing as duplicate.

> custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs
> -
>
> Key: IMPALA-13127
> URL: https://issues.apache.org/jira/browse/IMPALA-13127
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Not Applicable
>
>
> ASAN jobs have been intermittently hitting a failure in 
> custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
> {noformat}
> custom_cluster/test_runtime_filter_aggregation.py:129: in 
> test_late_query_state_init
>     self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
> common/impala_test_suite.py:1383: in assert_log_contains
>     ", but found none." % (log_file_path, line_regex)
> E   AssertionError: Expected at least one line in file 
> /data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
>  matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
> but found none.{noformat}
> Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
> specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13127.

Fix Version/s: Not Applicable
   Resolution: Duplicate

Fixed by followup change in IMPALA-13040, closing as duplicate.

> custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs
> -
>
> Key: IMPALA-13127
> URL: https://issues.apache.org/jira/browse/IMPALA-13127
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Not Applicable
>
>
> ASAN jobs have been intermittently hitting a failure in 
> custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
> {noformat}
> custom_cluster/test_runtime_filter_aggregation.py:129: in 
> test_late_query_state_init
>     self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
> common/impala_test_suite.py:1383: in assert_log_contains
>     ", but found none." % (log_file_path, line_regex)
> E   AssertionError: Expected at least one line in file 
> /data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
>  matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
> but found none.{noformat}
> Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
> specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13127:
--

 Summary: custom_cluster/test_runtime_filter_aggregation.py is 
failing on ASAN jobs
 Key: IMPALA-13127
 URL: https://issues.apache.org/jira/browse/IMPALA-13127
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


ASAN jobs have been intermittently hitting a failure in 
custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
{noformat}
custom_cluster/test_runtime_filter_aggregation.py:129: in 
test_late_query_state_init
    self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
common/impala_test_suite.py:1383: in assert_log_contains
    ", but found none." % (log_file_path, line_regex)
E   AssertionError: Expected at least one line in file 
/data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
 matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
but found none.{noformat}
Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13127:
--

 Summary: custom_cluster/test_runtime_filter_aggregation.py is 
failing on ASAN jobs
 Key: IMPALA-13127
 URL: https://issues.apache.org/jira/browse/IMPALA-13127
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


ASAN jobs have been intermittently hitting a failure in 
custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
{noformat}
custom_cluster/test_runtime_filter_aggregation.py:129: in 
test_late_query_state_init
    self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
common/impala_test_suite.py:1383: in assert_log_contains
    ", but found none." % (log_file_path, line_regex)
E   AssertionError: Expected at least one line in file 
/data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
 matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
but found none.{noformat}
Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13125) Set of tests for exploration_strategy=exhaustive varies between python 2 and 3

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13125:
--

 Summary: Set of tests for exploration_strategy=exhaustive varies 
between python 2 and 3
 Key: IMPALA-13125
 URL: https://issues.apache.org/jira/browse/IMPALA-13125
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


TLDR: Python 3 runs a different set of exhaustive tests than Python 2.

Longer version:

When looking into running Python 3 tests, I noticed that the set of tests 
running for the exhaustive tests is different for Python 2 vs Python 3. This 
was surprising.

It turns out there is a distinction between run-tests.py's 
--exploration_strategy=exhaustive vs the 
--workload_exploration_strategy="functional-query:exhaustive" option. The 
exhaustive job is actually doing the latter. This means that individual 
function-query workload classes see cls.exploration_strategy() == "exhaustive", 
but the logic that generates the test vector still see 
exploration_strategy=core and it still uses pairwise generation. Code:
{noformat}
    if exploration_strategy == 'exhaustive':
      return self.__generate_exhaustive_combinations()
    elif exploration_strategy in ['core', 'pairwise']:
      return self.__generate_pairwise_combinations(){noformat}
[https://github.com/apache/impala/blob/master/tests/common/test_vector.py#L165-L168]

Python 2 vs 3 changes the way dictionaries work, impacting the order of test 
dimensions and how it picks tests. So, the Python 3 exhaustive tests are 
different. This may expose latent bugs, because some combinations that meet the 
constraints are never actually run (e.g. some json encodings don't have the 
decimal_tiny table).

We can work to make them behave similarly, using pytest's --collect-only option 
to look at the differences (and compare them to actual existing runs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13125) Set of tests for exploration_strategy=exhaustive varies between python 2 and 3

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13125:
--

 Summary: Set of tests for exploration_strategy=exhaustive varies 
between python 2 and 3
 Key: IMPALA-13125
 URL: https://issues.apache.org/jira/browse/IMPALA-13125
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


TLDR: Python 3 runs a different set of exhaustive tests than Python 2.

Longer version:

When looking into running Python 3 tests, I noticed that the set of tests 
running for the exhaustive tests is different for Python 2 vs Python 3. This 
was surprising.

It turns out there is a distinction between run-tests.py's 
--exploration_strategy=exhaustive vs the 
--workload_exploration_strategy="functional-query:exhaustive" option. The 
exhaustive job is actually doing the latter. This means that individual 
function-query workload classes see cls.exploration_strategy() == "exhaustive", 
but the logic that generates the test vector still see 
exploration_strategy=core and it still uses pairwise generation. Code:
{noformat}
    if exploration_strategy == 'exhaustive':
      return self.__generate_exhaustive_combinations()
    elif exploration_strategy in ['core', 'pairwise']:
      return self.__generate_pairwise_combinations(){noformat}
[https://github.com/apache/impala/blob/master/tests/common/test_vector.py#L165-L168]

Python 2 vs 3 changes the way dictionaries work, impacting the order of test 
dimensions and how it picks tests. So, the Python 3 exhaustive tests are 
different. This may expose latent bugs, because some combinations that meet the 
constraints are never actually run (e.g. some json encodings don't have the 
decimal_tiny table).

We can work to make them behave similarly, using pytest's --collect-only option 
to look at the differences (and compare them to actual existing runs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13124) Migrate tests that use the 'unittest' package to use normal pytest base class

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13124:
--

 Summary: Migrate tests that use the 'unittest' package to use 
normal pytest base class
 Key: IMPALA-13124
 URL: https://issues.apache.org/jira/browse/IMPALA-13124
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Some tests use the 'unittest' package to be the base class of their tests. 
These can be run by pytest, but when running the tests with python 3, they fail 
with this message:
{noformat}
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/runner.py:150:
 in __init__
    self.result = func()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in _memocollect
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:315:
 in _memoizedcall
    res = function()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in 
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:605:
 in collect
    return super(Module, self).collect()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:459:
 in collect
    res = self.makeitem(name, obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:471:
 in makeitem
    collector=self, name=name, obj=obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:724:
 in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:338:
 in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:333:
 in 
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:595:
 in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:249:
 in _wrapped_call
    wrap_controller.send(call_outcome)
E   RuntimeError: generator raised StopIteration{noformat}
Converting them to use the regular pytest base classes works fine with python 3 
(and also python 2).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13124) Migrate tests that use the 'unittest' package to use normal pytest base class

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13124:
--

 Summary: Migrate tests that use the 'unittest' package to use 
normal pytest base class
 Key: IMPALA-13124
 URL: https://issues.apache.org/jira/browse/IMPALA-13124
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Some tests use the 'unittest' package to be the base class of their tests. 
These can be run by pytest, but when running the tests with python 3, they fail 
with this message:
{noformat}
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/runner.py:150:
 in __init__
    self.result = func()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in _memocollect
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:315:
 in _memoizedcall
    res = function()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in 
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:605:
 in collect
    return super(Module, self).collect()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:459:
 in collect
    res = self.makeitem(name, obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:471:
 in makeitem
    collector=self, name=name, obj=obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:724:
 in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:338:
 in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:333:
 in 
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:595:
 in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:249:
 in _wrapped_call
    wrap_controller.send(call_outcome)
E   RuntimeError: generator raised StopIteration{noformat}
Converting them to use the regular pytest base classes works fine with python 3 
(and also python 2).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13123) Add a way to run tests with python 3

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13123:
--

 Summary: Add a way to run tests with python 3
 Key: IMPALA-13123
 URL: https://issues.apache.org/jira/browse/IMPALA-13123
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


As a first step towards switching to python 3, we need an option to run the 
tests using the toolchain python 3. For example, there could be an environment 
variable that tells tests/run-tests.py and bin/impala-py.test to use python 3.

This can be combined with a first round of fixes to get a decent number of 
tests running and see what is broken. The fixes must be compatible with python 
2, and the default will still be python 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13123) Add a way to run tests with python 3

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13123:
--

 Summary: Add a way to run tests with python 3
 Key: IMPALA-13123
 URL: https://issues.apache.org/jira/browse/IMPALA-13123
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


As a first step towards switching to python 3, we need an option to run the 
tests using the toolchain python 3. For example, there could be an environment 
variable that tells tests/run-tests.py and bin/impala-py.test to use python 3.

This can be combined with a first round of fixes to get a decent number of 
tests running and see what is broken. The fixes must be compatible with python 
2, and the default will still be python 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12686) Build the toolchain with basic debug information (-g1)

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12686:
--

Assignee: Joe McDonnell

> Build the toolchain with basic debug information (-g1)
> --
>
> Key: IMPALA-12686
> URL: https://issues.apache.org/jira/browse/IMPALA-12686
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Currently, we build most of the toolchain without debug information and 
> without "-fno-omit-frame-pointers". This makes it difficult to get reliable 
> stack traces that go through some of those libraries. We should build the 
> toolchain with basic debug information (-g1) to get reliable stack traces.
> For some libraries, we want to compile with full debug information (-g) to 
> allow the ability to step through the code with a debugger. Currently, ORC 
> and Kudu (and others) are built with -g and should stay that way. We should 
> add -g for Thrift.
> To save space, we should also enable compressed debug information (-gz) to 
> keep the sizes from growing too much (and reduce the size of existing debug 
> information).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13057.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Incorporate tuple/slot information into the tuple cache key
> ---
>
> Key: IMPALA-13057
> URL: https://issues.apache.org/jira/browse/IMPALA-13057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Since the tuple and slot information is kept separately in the descriptor 
> table, it does not get incorporated into the PlanNode thrift used for the 
> tuple cache key. This means that the tuple cache can't distinguish between 
> these two queries:
> {noformat}
> select int_col1 from table;
> select int_col2 from table;{noformat}
> To solve this, the tuple/slot information needs to be incorporated into the 
> cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
> place to serialize the TupleDescriptor/SlotDescriptors and incorporate it 
> into the hash.
> The tuple ids and slot ids are global ids, so the value is influenced by the 
> entirety of the query. This is a problem for matching cache results across 
> different queries. As part of incorporating the tuple/slot information, we 
> should also add an ability to translate tuple/slot ids into ids local to a 
> subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13057.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Incorporate tuple/slot information into the tuple cache key
> ---
>
> Key: IMPALA-13057
> URL: https://issues.apache.org/jira/browse/IMPALA-13057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Since the tuple and slot information is kept separately in the descriptor 
> table, it does not get incorporated into the PlanNode thrift used for the 
> tuple cache key. This means that the tuple cache can't distinguish between 
> these two queries:
> {noformat}
> select int_col1 from table;
> select int_col2 from table;{noformat}
> To solve this, the tuple/slot information needs to be incorporated into the 
> cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
> place to serialize the TupleDescriptor/SlotDescriptors and incorporate it 
> into the hash.
> The tuple ids and slot ids are global ids, so the value is influenced by the 
> entirety of the query. This is a problem for matching cache results across 
> different queries. As part of incorporating the tuple/slot information, we 
> should also add an ability to translate tuple/slot ids into ids local to a 
> subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13072.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Toolchain: Add retries for uploading artifacts to the s3 buckets
> 
>
> Key: IMPALA-13072
> URL: https://issues.apache.org/jira/browse/IMPALA-13072
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
> {noformat}
> 22:17:06 impala-toolchain-redhat8: Uploading 
> /mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
> Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
> Since we do many uploads, even a relatively low failure rate can make it hard 
> to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13072.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Toolchain: Add retries for uploading artifacts to the s3 buckets
> 
>
> Key: IMPALA-13072
> URL: https://issues.apache.org/jira/browse/IMPALA-13072
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
> {noformat}
> 22:17:06 impala-toolchain-redhat8: Uploading 
> /mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
> Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
> Since we do many uploads, even a relatively low failure rate can make it hard 
> to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IMPALA-13073) Toolchain builds should pass VERBOSE=1 into make

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13073:
--

Assignee: Joe McDonnell

> Toolchain builds should pass VERBOSE=1 into make
> 
>
> Key: IMPALA-13073
> URL: https://issues.apache.org/jira/browse/IMPALA-13073
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> It is useful to be able to examine the compilation flags for toolchain 
> components. Sometimes we want to add --fno-omit-frame-pointers or add debug 
> symbols with -g1 and verify that it actually gets set. For projects that use 
> CMake, the output often does not print the compile command. CMake can produce 
> a compilation database, but it is simpler to have make print the compilation 
> command by adding VERBOSE=1. The output isn't that big and output gets 
> redirected to a file, so it seems like we could leave it on by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-31 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851229#comment-17851229
 ] 

Joe McDonnell commented on IMPALA-13072:


Fixed by this commit:
{noformat}
commit f601ec33f2bcfaab19a46cff5fc6f0a90e22da8d
Author: Joe McDonnell 
Date:   Fri May 10 17:22:56 2024 -0700    IMPALA-13072: Add retries for s3 
uploads to combat flakiness
    
    On ARM toolchain builds, we have seen some uploads to s3 fail
    with a segementation fault. Given the number of artifacts that
    the toolchain uploads, even a relatively low error rate can
    make it hard to get a passing build. This modifies the s3
    upload code to retry up to 10 times to avoid this flakiness.
    
    Testing:
     - Ran an ARM toolchain build and saw the retry happen
       successfully
     - Ran a toolchain build with an invalid s3 bucket and verified
       it failed after 10 retries
    
    Change-Id: I95d858c99e965730303c2bfd90478ac5f68acf83
    Reviewed-on: http://gerrit.cloudera.org:8080/21421
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Toolchain: Add retries for uploading artifacts to the s3 buckets
> 
>
> Key: IMPALA-13072
> URL: https://issues.apache.org/jira/browse/IMPALA-13072
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
> {noformat}
> 22:17:06 impala-toolchain-redhat8: Uploading 
> /mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
> Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
> Since we do many uploads, even a relatively low failure rate can make it hard 
> to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13111.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13111.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13121:
--

Assignee: Joe McDonnell

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13111:
--

Assignee: Joe McDonnell

> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-05-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13121:
--

 Summary: Move the toolchain to a newer version of ccache
 Key: IMPALA-13121
 URL: https://issues.apache.org/jira/browse/IMPALA-13121
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
debug info, I ran into a case where the debug level was not what I expected. I 
had added a -g0 at the end to turn off debug information for the cmake build, 
but it still ended up with debug info.

The release notes for ccache 3.3.5 says this:
 * Fixed a regression where the original order of debug options could be lost. 
This reverts the “Improved parsing of {{-g*}} options” feature in ccache 3.3.

[https://ccache.dev/releasenotes.html#_ccache_3_3_5]

I think I may have been hitting that. We should upgrade ccache to a more recent 
version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-05-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13121:
--

 Summary: Move the toolchain to a newer version of ccache
 Key: IMPALA-13121
 URL: https://issues.apache.org/jira/browse/IMPALA-13121
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
debug info, I ran into a case where the debug level was not what I expected. I 
had added a -g0 at the end to turn off debug information for the cmake build, 
but it still ended up with debug info.

The release notes for ccache 3.3.5 says this:
 * Fixed a regression where the original order of debug options could be lost. 
This reverts the “Improved parsing of {{-g*}} options” feature in ccache 3.3.

[https://ccache.dev/releasenotes.html#_ccache_3_3_5]

I think I may have been hitting that. We should upgrade ccache to a more recent 
version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-28 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13111:
--

 Summary: impala-gdb.py's find-query-ids/find-fragment-instances 
return unusable query ids
 Key: IMPALA-13111
 URL: https://issues.apache.org/jira/browse/IMPALA-13111
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
information about the queries / fragments running in a core file. However, the 
query/fragment ids that it returns have issues with the signedness of the 
integers:
{noformat}
(gdb) find-fragment-instances
Fragment Instance Id    Thread IDs
-23b76c1699a831a1:279358680036    [117120]
-23b76c1699a831a1:279358680037    [117121]
-23b76c1699a831a1:279358680038    [117122]
..

(gdb) find-query-ids
-3cbda1606b3ade7c:f170c4bd
-23b76c1699a831a1:27935868
68435df1364aa90f:1752944f
3442ed6354c7355d:78c83d20{noformat}
The low values for find-query-ids don't have this problem, because it is ANDed 
with 0x:
{noformat}
            qid_low = format(int(qid_low, 16) & 0x, 
'x'){noformat}
We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-28 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13111:
--

 Summary: impala-gdb.py's find-query-ids/find-fragment-instances 
return unusable query ids
 Key: IMPALA-13111
 URL: https://issues.apache.org/jira/browse/IMPALA-13111
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
information about the queries / fragments running in a core file. However, the 
query/fragment ids that it returns have issues with the signedness of the 
integers:
{noformat}
(gdb) find-fragment-instances
Fragment Instance Id    Thread IDs
-23b76c1699a831a1:279358680036    [117120]
-23b76c1699a831a1:279358680037    [117121]
-23b76c1699a831a1:279358680038    [117122]
..

(gdb) find-query-ids
-3cbda1606b3ade7c:f170c4bd
-23b76c1699a831a1:27935868
68435df1364aa90f:1752944f
3442ed6354c7355d:78c83d20{noformat}
The low values for find-query-ids don't have this problem, because it is ANDed 
with 0x:
{noformat}
            qid_low = format(int(qid_low, 16) & 0x, 
'x'){noformat}
We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13020.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13020:
--

Assignee: Joe McDonnell

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13020.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13082) Use separate versions for jackson-databind vs jackson-core, etc.

2024-05-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13082:
--

 Summary: Use separate versions for jackson-databind vs 
jackson-core, etc.
 Key: IMPALA-13082
 URL: https://issues.apache.org/jira/browse/IMPALA-13082
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


We have a single jackson-databind.version property defined populated by the 
IMPALA_JACKSON_DATABIND_VERSION. This currently sets the version for 
jackson-databind as well as other jackson libraries like jackson-core.

Sometimes there is a jackson-databind patch release without a release of other 
jackson libraries. For example, there is a jackson-databind 2.12.7.1, but there 
is no jackson-core 2.12.7.1. There is only jackson-core 2.12.7. To handle these 
patch scenarios, it is useful to split out the jackson-databind version from 
the version for other jackson libraries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13082) Use separate versions for jackson-databind vs jackson-core, etc.

2024-05-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13082:
--

 Summary: Use separate versions for jackson-databind vs 
jackson-core, etc.
 Key: IMPALA-13082
 URL: https://issues.apache.org/jira/browse/IMPALA-13082
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


We have a single jackson-databind.version property defined populated by the 
IMPALA_JACKSON_DATABIND_VERSION. This currently sets the version for 
jackson-databind as well as other jackson libraries like jackson-core.

Sometimes there is a jackson-databind patch release without a release of other 
jackson libraries. For example, there is a jackson-databind 2.12.7.1, but there 
is no jackson-core 2.12.7.1. There is only jackson-core 2.12.7. To handle these 
patch scenarios, it is useful to split out the jackson-databind version from 
the version for other jackson libraries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13073) Toolchain builds should pass VERBOSE=1 into make

2024-05-11 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13073:
--

 Summary: Toolchain builds should pass VERBOSE=1 into make
 Key: IMPALA-13073
 URL: https://issues.apache.org/jira/browse/IMPALA-13073
 Project: IMPALA
  Issue Type: Improvement
Reporter: Joe McDonnell


It is useful to be able to examine the compilation flags for toolchain 
components. Sometimes we want to add --fno-omit-frame-pointers or add debug 
symbols with -g1 and verify that it actually gets set. For projects that use 
CMake, the output often does not print the compile command. CMake can produce a 
compilation database, but it is simpler to have make print the compilation 
command by adding VERBOSE=1. The output isn't that big and output gets 
redirected to a file, so it seems like we could leave it on by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13073) Toolchain builds should pass VERBOSE=1 into make

2024-05-11 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13073:
--

 Summary: Toolchain builds should pass VERBOSE=1 into make
 Key: IMPALA-13073
 URL: https://issues.apache.org/jira/browse/IMPALA-13073
 Project: IMPALA
  Issue Type: Improvement
Reporter: Joe McDonnell


It is useful to be able to examine the compilation flags for toolchain 
components. Sometimes we want to add --fno-omit-frame-pointers or add debug 
symbols with -g1 and verify that it actually gets set. For projects that use 
CMake, the output often does not print the compile command. CMake can produce a 
compilation database, but it is simpler to have make print the compilation 
command by adding VERBOSE=1. The output isn't that big and output gets 
redirected to a file, so it seems like we could leave it on by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >