[jira] [Commented] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-26 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869039#comment-17869039
 ] 

Joe McDonnell commented on IMPALA-13253:


The AWS LB has an idle time limit of 350 seconds that does not explicitly 
notify either end that the connection is dead: 
[https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-configurable-idle-timeout-for-connection-tracking/]

The libkeepalive library can be used to force a program to use TCP keepalive 
without needing to recompile it: 
[https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html]

Testing using libkeepalive and iptables shows that it behaves as expected: It 
can handle situations where packets are dropped or rejected. In a cluster that 
uses the AWS LB, this can be set to have a keepalive time of 400 seconds to 
very quickly detect and close connections that AWS LB considers idle.

I think keepalive should be on by default.

> Add option to use TCP keepalives for client connections
> ---
>
> Key: IMPALA-13253
> URL: https://issues.apache.org/jira/browse/IMPALA-13253
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> A client can be disconnected without explicitly closing its TCP connection. 
> This can happen if the client machine resets or there is a network 
> disruption. In particular, load balancers can have an idle time that results 
> in a connection becoming invalid. Impala can't really guarantee that the 
> client will properly tear down its connection and the Impala side resources 
> will be released.
> TCP keepalive would allow Impala to detect dead clients and close the 
> connection. It also can prevent a load balancer from seeing the connection as 
> idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13253:
---
Priority: Blocker  (was: Critical)

> Add option to use TCP keepalives for client connections
> ---
>
> Key: IMPALA-13253
> URL: https://issues.apache.org/jira/browse/IMPALA-13253
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> A client can be disconnected without explicitly closing its TCP connection. 
> This can happen if the client machine resets or there is a network 
> disruption. In particular, load balancers can have an idle time that results 
> in a connection becoming invalid. Impala can't really guarantee that the 
> client will properly tear down its connection and the Impala side resources 
> will be released.
> TCP keepalive would allow Impala to detect dead clients and close the 
> connection. It also can prevent a load balancer from seeing the connection as 
> idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13202) Impala workloads can exceed Kudu client's rpc_max_message_size limit

2024-07-24 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868423#comment-17868423
 ] 

Joe McDonnell commented on IMPALA-13202:


I filed https://issues.apache.org/jira/browse/KUDU-3595 for the Kudu-side 
change. This Jira will track the Impala side change to pick up a new Kudu and 
add a startup parameter to set it.

> Impala workloads can exceed Kudu client's rpc_max_message_size limit
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-07-24 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868420#comment-17868420
 ] 

Joe McDonnell commented on IMPALA-13183:


Here is an AWS blog post about how the AWS LB works: 
[https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-configurable-idle-timeout-for-connection-tracking/]

The section about "Scenario #1: TCP connections through AWS Services" explains 
that it doesn't send packets when a connection goes idle. An endpoint would 
only find out when it sends a message. I think this is a problem for Impala, 
and having an idle connection timeout would be one way to avoid issues.

> Add default timeout for hs2/beeswax server sockets
> --
>
> Key: IMPALA-13183
> URL: https://issues.apache.org/jira/browse/IMPALA-13183
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala only sets timeout  for specific operations, for example 
> during SASL handshake and when checking if connection can be closed due to 
> idle session.
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145
> There are several cases where an inactive client could keep the connection 
> open indefinitely, for example if it hasn't opened a session yet.
> I think that there should be a general longer timeout set for both send/recv, 
> e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13202) Impala workloads can exceed Kudu client's rpc_max_message_size limit

2024-07-24 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13202:
---
Summary: Impala workloads can exceed Kudu client's rpc_max_message_size 
limit  (was: KRPC flags used by libkudu_client.so can't be configured)

> Impala workloads can exceed Kudu client's rpc_max_message_size limit
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-07-23 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868178#comment-17868178
 ] 

Joe McDonnell commented on IMPALA-13183:


I was just about to file a Jira about having functionality to close idle 
connections. This sounds similar, so I'm commenting here. We can split it off 
if it is not quite the same.

Basically, there is no current mechanism to close idle connections that have no 
session. There are circumstances where Hue and other clients that use a 
connection pool can create these sessions. For example, Hue might want to close 
a query that was executed by a different connection. It opens a connection 
using the existing session, then when it tries to close the query/session, it 
finds out that the query/session was already closed. This connection ends up 
with no associated session and can stay that way for an indefinite period of 
time.

We have seen cases where these connections can stay open on the server side 
even after the client tries to close it. That seems to be happening with 
certain load balancers, and it can cause the server to run out of fe service 
threads.

> Add default timeout for hs2/beeswax server sockets
> --
>
> Key: IMPALA-13183
> URL: https://issues.apache.org/jira/browse/IMPALA-13183
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala only sets timeout  for specific operations, for example 
> during SASL handshake and when checking if connection can be closed due to 
> idle session.
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
> https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145
> There are several cases where an inactive client could keep the connection 
> open indefinitely, for example if it hasn't opened a session yet.
> I think that there should be a general longer timeout set for both send/recv, 
> e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13253) Add option to use TCP keepalives for client connections

2024-07-23 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13253:
--

 Summary: Add option to use TCP keepalives for client connections
 Key: IMPALA-13253
 URL: https://issues.apache.org/jira/browse/IMPALA-13253
 Project: IMPALA
  Issue Type: Task
  Components: Backend, Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A client can be disconnected without explicitly closing its TCP connection. 
This can happen if the client machine resets or there is a network disruption. 
In particular, load balancers can have an idle time that results in a 
connection becoming invalid. Impala can't really guarantee that the client will 
properly tear down its connection and the Impala side resources will be 
released.

TCP keepalive would allow Impala to detect dead clients and close the 
connection. It also can prevent a load balancer from seeing the connection as 
idle. This can be important for clients that hold connections in a pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13230) Add a way to dump stack traces for impala-shell while it is running

2024-07-16 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866577#comment-17866577
 ] 

Joe McDonnell commented on IMPALA-13230:


Example stack trace while running a query:
{noformat}
  File "shell/build/python3_venv/bin/impala-shell", line 8, in 
    sys.exit(impala_shell_main())
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 2305, in impala_shell_main
    shell.cmdloop(intro)
  File "/usr/lib/python3.8/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 788, in onecmd
    return func(arg)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 1239, in do_select
    return self._execute_stmt(query_str, print_web_link=True)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_shell.py",
 line 1426, in _execute_stmt
    for rows in rows_fetched:
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 926, in fetch
    resp = self._do_hs2_rpc(FetchResults, req)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 1148, in _do_hs2_rpc
    rpc_output = rpc(rpc_input)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/impala_client.py",
 line 920, in FetchResults
    return self.imp_service.FetchResults(req)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/TCLIService/TCLIService.py",
 line 756, in FetchResults
    return self.recv_FetchResults()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/impala_shell/TCLIService/TCLIService.py",
 line 768, in recv_FetchResults
    (fname, mtype, rseqid) = iprot.readMessageBegin()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 134, in readMessageBegin
    sz = self.readI32()
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 217, in readI32
    buff = self.trans.readAll(4)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TTransport.py",
 line 62, in readAll
    chunk = self.read(sz - have)
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TTransport.py",
 line 164, in read
    self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
  File 
"/home/joemcdonnell/upstream/Impala/shell/build/python3_venv/lib/python3.8/site-packages/thrift/transport/TSocket.py",
 line 150, in read
    buff = self.handle.recv(sz)
{noformat}

> Add a way to dump stack traces for impala-shell while it is running
> ---
>
> Key: IMPALA-13230
> URL: https://issues.apache.org/jira/browse/IMPALA-13230
> Project: IMPALA
>  Issue Type: Task
>  Components: Clients
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> It can be useful to get the Python stack traces for impala-shell when it is 
> stuck. There is a nice thread on Stack Overflow about how to do this: 
> [https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application]
> One option is to install a signal handler for the SIGUSR1 signal and use that 
> to dump a backtrace. I tried this and it works for Python 3 (but causes 
> issues for running queries on Python 2):
> {noformat}
>     # For debugging, it is useful to handle the SIGUSR1 symbol and use it to 
> print a
>     # stacktrace
>     signal.signal(signal.SIGUSR1, lambda sid, stack: 
> traceback.print_stack(stack)){noformat}
> Another option mentioned is the faulthandler module 
> ([https://docs.python.org/dev/library/faulthandler.html|https://docs.python.org/dev/library/faulthandler.html)]
>  ), which provides a way to do the same thing. The faulthandler module seems 
> to be able to do this for all threads, not just the main thread.
> Either way, this would give us some options if we need to debug impala-shell 
> out in the wild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Created] (IMPALA-13230) Add a way to dump stack traces for impala-shell while it is running

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13230:
--

 Summary: Add a way to dump stack traces for impala-shell while it 
is running
 Key: IMPALA-13230
 URL: https://issues.apache.org/jira/browse/IMPALA-13230
 Project: IMPALA
  Issue Type: Task
  Components: Clients
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


It can be useful to get the Python stack traces for impala-shell when it is 
stuck. There is a nice thread on Stack Overflow about how to do this: 
[https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application]

One option is to install a signal handler for the SIGUSR1 signal and use that 
to dump a backtrace. I tried this and it works for Python 3 (but causes issues 
for running queries on Python 2):
{noformat}
    # For debugging, it is useful to handle the SIGUSR1 symbol and use it to 
print a
    # stacktrace
    signal.signal(signal.SIGUSR1, lambda sid, stack: 
traceback.print_stack(stack)){noformat}
Another option mentioned is the faulthandler module 
([https://docs.python.org/dev/library/faulthandler.html|https://docs.python.org/dev/library/faulthandler.html)]
 ), which provides a way to do the same thing. The faulthandler module seems to 
be able to do this for all threads, not just the main thread.

Either way, this would give us some options if we need to debug impala-shell 
out in the wild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13229) Improve logging for TAcceptQueueServer when a thread takes a long time in SASL negotiation

2024-07-16 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13229:
--

 Summary: Improve logging for TAcceptQueueServer when a thread 
takes a long time in SASL negotiation
 Key: IMPALA-13229
 URL: https://issues.apache.org/jira/browse/IMPALA-13229
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


In IMPALA-11653, we are concerned about bad clients that use up threads in the 
SASL negotiation thread pool for long periods of time (or eventually hit 
sasl_connect_tcp_timeout_ms).

As a separate task, it would be useful to be able to quickly tell from the logs 
whether a connection spends a lot of time in the SASL negotiation and could be 
creating this type of problem.

We should add some logging to make this issue clear from the logs. One option 
is to log a warning if SASL negotiation takes longer than some threshold (and 
thus was using up a thread during that time). If SASL negotiation is taking 
longer than a few seconds, that can be a real issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured

2024-07-15 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866157#comment-17866157
 ] 

Joe McDonnell commented on IMPALA-13202:


It seems like one path would be for Kudu client to add this as a configuration 
in KuduClientBuilder and then Impala could specify the value there. That is how 
we usually pass in configuration parameters for the Kudu client. See 
[https://github.com/apache/impala/blob/master/be/src/exec/kudu/kudu-util.cc#L85-L104]
 . I think it is good to have these things as part of the client API rather 
than setting global variables. I think it is good that Kudu client's flags are 
hidden and can't be set.

My understanding is that Impala's rpc_max_message_size parameter was intended 
to apply for Impala to Impala communication, not Impala to Kudu communication.

> KRPC flags used by libkudu_client.so can't be configured
> 
>
> Key: IMPALA-13202
> URL: https://issues.apache.org/jira/browse/IMPALA-13202
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: data.parquet
>
>
> The way Impala integrates with KRPC is porting the KRPC codes into the Impala 
> code base. Flags and methods of KRPC are defined as GLOBAL in the impalad 
> executable. libkudu_client.so also compiles from the same KRPC codes and have 
> duplicate flags and methods defined as HIDDEN.
> To be specifit, both the impalad executable and libkudu_client.so have the 
> symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() 
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer
>  8: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
>  81380: 022f5c88  1936 FUNCGLOBAL DEFAULT   13 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep ReceiveBuffer
>   1601: 00086e4a   108 FUNCLOCAL  DEFAULT   12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold
>  11905: 001fec60  2076 FUNCLOCAL  HIDDEN12 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> $ c++filt 
> _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE
> kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) 
> {noformat}
> KRPC flags like rpc_max_message_size are also defined in both the impalad 
> executable and libkudu_client.so:
> {noformat}
> $ readelf -s --wide be/build/latest/service/impalad | grep 
> FLAGS_rpc_max_message_size
>  14380: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
>  80396: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  81399: 06006741 1 OBJECT  GLOBAL DEFAULT   30 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
> 117873: 06006738 8 OBJECT  GLOBAL DEFAULT   30 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ readelf -s --wide 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so
>  | grep FLAGS_rpc_max_message_size
>  11882: 008d61e1 1 OBJECT  LOCAL  HIDDEN27 
> _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE
>  11906: 008d61d8 8 OBJECT  LOCAL  DEFAULT   27 
> _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE
> fLI64::FLAGS_rpc_max_message_size {noformat}
> libkudu_client.so uses its own methods and flags. The flags are HIDDEN so 
> can't be modified by Impala codes. E.g. IMPALA-4874 bumps 
> FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable 
> FLAGS_rpc_max_message_size used in libkudu_client.so is still the default 
> value 50MB (52428800). We've seen error messages like this in the master 
> branch:
> {code:java}
> I0708 10:23:31.784974  2943 meta_cache.cc:294] 
> c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: 
> replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: 
> Network error: TS failed: RPC frame had a length of 53477464, but we only 
> support messages up to 52428800 bytes long.{code}
> CC [~joemcdonnell] [~wzhou] [~aserbin] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12906) Incorporate run time scan range information into the tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12906 started by Joe McDonnell.
--
> Incorporate run time scan range information into the tuple cache key
> 
>
> Key: IMPALA-12906
> URL: https://issues.apache.org/jira/browse/IMPALA-12906
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The cache key for tuple caching currently doesn't incorporate information 
> about the scan ranges for the tables that it scans. This is important for 
> detecting changes in the table and having different cache keys for different 
> fragment instances that are assigned different scan ranges.
> To make this deterministic for mt_dop, we need mt_dop to assign scan ranges 
> deterministically to individual fragment instances rather than using the 
> shared queue introduced in IMPALA-9655.
> One way to implement this is to collect information about the scan nodes that 
> feed into the tuple cache and pass that information over to the tuple cache 
> node. At runtime, it can hash the scan ranges assigned to those scan nodes 
> and incorporate that into the cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12906) Incorporate run time scan range information into the tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12906:
--

Assignee: Joe McDonnell

> Incorporate run time scan range information into the tuple cache key
> 
>
> Key: IMPALA-12906
> URL: https://issues.apache.org/jira/browse/IMPALA-12906
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The cache key for tuple caching currently doesn't incorporate information 
> about the scan ranges for the tables that it scans. This is important for 
> detecting changes in the table and having different cache keys for different 
> fragment instances that are assigned different scan ranges.
> To make this deterministic for mt_dop, we need mt_dop to assign scan ranges 
> deterministically to individual fragment instances rather than using the 
> shared queue introduced in IMPALA-9655.
> One way to implement this is to collect information about the scan nodes that 
> feed into the tuple cache and pass that information over to the tuple cache 
> node. At runtime, it can hash the scan ranges assigned to those scan nodes 
> and incorporate that into the cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12817) Introduce basic intermediate result caching to speed similar queries

2024-06-27 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12817:
--

Assignee: Joe McDonnell

> Introduce basic intermediate result caching to speed similar queries
> 
>
> Key: IMPALA-12817
> URL: https://issues.apache.org/jira/browse/IMPALA-12817
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> This tracks the first phase of intermediate result caching.
> The goals of the initial phase are to introduce a basic framework for caching 
> tuples at various points in the plan. The first location that needs to work 
> is immediately above an HdfsScanNode. Caching will use a local SSD to store 
> the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13188) Add test that compute stats does not result in a different tuple cache key

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13188:
--

 Summary: Add test that compute stats does not result in a 
different tuple cache key
 Key: IMPALA-13188
 URL: https://issues.apache.org/jira/browse/IMPALA-13188
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If someone runs "compute stats" on the underlying tables for a query, the tuple 
cache key should only change if the plan actually changes. The resource 
estimates should not be incorporated into the tuple cache key as they have no 
semantic impact. The code already excludes the resource estimates from the key 
for the PlanNode, but we should have tests for computing stats and verifying 
that the key doesn't change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13186) Tuple cache keys should incorporate information about related query options

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13186:
--

 Summary: Tuple cache keys should incorporate information about 
related query options
 Key: IMPALA-13186
 URL: https://issues.apache.org/jira/browse/IMPALA-13186
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Currently, the tuple cache key does not include information from the query 
options. Many query options have no impact on the result of a query (e.g. 
idle_session_timeout) or are evaluated purely on the coordinator during 
planning (e.g. broadcast_bytes_limit). 

However, some query options can impact behavior either by controlling how 
certain things are calculated (e.g. decimal_v2) or controlling what conditions 
result in an error. Changing a query option can change the output of a query.

We need some way to incorporate the relevant query options into the tuple cache 
key so there is no correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-06-27 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13185:
--

 Summary: Tuple cache keys need to incorporate runtime filter 
information
 Key: IMPALA-13185
 URL: https://issues.apache.org/jira/browse/IMPALA-13185
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


If a runtime filter impacts the results of a fragment, then the tuple cache key 
needs to incorporate information about the generation of that runtime filter. 
This needs to include information about the base tables that impact the runtime 
filter.

For example, suppose there is a join. The build side of the join produces a 
runtime filter that gets delivered to the probe side of the join. The tuple 
cache key for the probe side of the join will need to include a representation 
of the runtime filter. If the table on the build side of the join changes, the 
tuple cache key for the probe side needs to change due to the possible 
difference in the runtime filter.

This can also impact eligibility. In theory, the build side of a join could be 
constructed from a source with a limit specified, and this can result in 
non-determinism. Since the build of the runtime filter is not deterministic, 
the consumer of the runtime filter is not deterministic and can't participate 
in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13181) Disable tuple caching for locations that have a limit

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13181:
--

 Summary: Disable tuple caching for locations that have a limit
 Key: IMPALA-13181
 URL: https://issues.apache.org/jira/browse/IMPALA-13181
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Statements that use a limit are non-deterministic unless there is a sort. 
Locations with limits should be marked ineligible for tuple caching.

As an example, for a hash join, suppose the build side has a limit. This means 
that the build side could vary from run to run. A requirement for our 
correctness is that all nodes agree on the contents of the build side. The 
variability of the limit is a problem for the build side, because if one node 
hits the cache and another does not, there is no guarantee that they agree on 
the contents of the build side.

Concrete example: 
{noformat}
select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 
10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
There are times when limits are deterministic or the non-determinism is 
harmless. It is safer to ban in completely at first. In a future change, this 
rule can be relaxed to allow caching in those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13179) Disable tuple caching when using non-deterministic functions

2024-06-25 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13179:
--

 Summary: Disable tuple caching when using non-deterministic 
functions
 Key: IMPALA-13179
 URL: https://issues.apache.org/jira/browse/IMPALA-13179
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Some functions are non-deterministic, so tuple caching needs to detect those 
functions and avoid caching at locations that are non-deterministic.

There are two different pieces:
 # Correctness: If the key is constant but the results can be variable, then 
that is a correctness issue. That can happen for genuinely random functions 
like uuid(). It can happen when timestamp functions like now() are evaluated at 
runtime.
 # Performance: The frontend does constant-folding of functions that don't vary 
during executions, so something like now() might be replaced by a hard-coded 
integer. This means that the key contains something that varies frequently. 
That can be a performance issue, because we can be caching things that cannot 
be reused. This doesn't have the same correctness issue.

This ticket is focused on correctness piece. If uuid()/now()/etc are referenced 
and would be evaluated at runtime, the location should be ineligible for tuple 
caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12541) Compile toolchain GCC with --enable-linker-build-id to add Build ID to binaries

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12541.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Compile toolchain GCC with --enable-linker-build-id to add Build ID to 
> binaries
> ---
>
> Key: IMPALA-12541
> URL: https://issues.apache.org/jira/browse/IMPALA-12541
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> A "Build ID" is a unique identifier for binaries (which is a hash of the 
> contents). Producing OS packages with separate debug symbols requires each 
> binary to have a Build ID. This is particularly important for libstdc++, 
> because it is produced during the native-toolchain build rather than the 
> regular Impala build. To turn on Build IDs, one can configure that at GCC 
> build time by specifying "--enable-linker-build-id". This causes GCC to tell 
> the linker to compute the Build ID.
> Breakpad will also use the Build ID when resolving symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12541) Compile toolchain GCC with --enable-linker-build-id to add Build ID to binaries

2024-06-25 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859971#comment-17859971
 ] 

Joe McDonnell commented on IMPALA-12541:


{noformat}
commit e78b0ef34241218cda7eac3b526cb6a824596df1
Author: Joe McDonnell 
Date:   Fri Nov 3 14:18:47 2023 -0700    IMPALA-12541: Build GCC with 
--enable-linker-build-id
    
    This builds GCC with --enable-linker-build-id so that
    binaries have Build ID specified. Build ID is needed to
    produce OS packages with separate debuginfo. This is
    particularly important for libstdc++, because it is
    not built as part of the regular Impala build.
    
    Testing:
     - Verified that resulting binaries have .note.gnu.build-id
    
    Change-Id: Ieb2017ba1a348a9e9e549fa3268635afa94ae6d0
    Reviewed-on: http://gerrit.cloudera.org:8080/21469
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Compile toolchain GCC with --enable-linker-build-id to add Build ID to 
> binaries
> ---
>
> Key: IMPALA-12541
> URL: https://issues.apache.org/jira/browse/IMPALA-12541
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> A "Build ID" is a unique identifier for binaries (which is a hash of the 
> contents). Producing OS packages with separate debug symbols requires each 
> binary to have a Build ID. This is particularly important for libstdc++, 
> because it is produced during the native-toolchain build rather than the 
> regular Impala build. To turn on Build IDs, one can configure that at GCC 
> build time by specifying "--enable-linker-build-id". This causes GCC to tell 
> the linker to compute the Build ID.
> Breakpad will also use the Build ID when resolving symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-06-25 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859962#comment-17859962
 ] 

Joe McDonnell commented on IMPALA-13121:


{noformat}
commit b9167e985c69fd321e9e25e5ae0c7747682f06f6
Author: Joe McDonnell 
Date:   Fri May 31 15:20:20 2024 -0700    IMPALA-13121: Switch to ccache 3.7.12
    
    The docker images currently build and use ccache 3.3.3.
    Recently, we ran into a case where debuginfo was being
    generated even though the cflags ended with -g0. The
    ccache release history has this note for 3.3.5:
     - Fixed a regression where the original order of
       debug options could be lost.
    
    This upgrades ccache to 3.7.12 to address this issue.
    
    Ccache 3.7.12 is the last ccache release that builds
    using autotools. Ccache 4 moves to build with CMake.
    Adding a CMake dependency would be complicated at this
    stage, because some of the older OSes don't provide a
    new enough CMake in the package repositories. Since we
    don't really need the new features of Ccache 4+, this
    sticks with 3.7.12 for now.
    
    This reenables the check_ccache_works() logic in
    assert-dependencies-present.py.
    
    Testing:
     - Built docker images and ran a toolchain build
     - The newer ccache resolves the unexpected debuginfo issue
    
    Change-Id: I90d751445daa0dc298b634c1049d637a14afac40
    Reviewed-on: http://gerrit.cloudera.org:8080/21473
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13121.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13146.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13136) Refactor AnalyzedFunctionCallExpr

2024-06-12 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854526#comment-17854526
 ] 

Joe McDonnell commented on IMPALA-13136:


[~scarlin] I'm ok with punting on this for a while. We have a long list of 
things that need to land, and this is more about code cleanliness than 
functionality.

> Refactor AnalyzedFunctionCallExpr
> -
>
> Key: IMPALA-13136
> URL: https://issues.apache.org/jira/browse/IMPALA-13136
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Steve Carlin
>Priority: Major
>
> Copied from code review:
> The part where we immediately analyze as part of the constructor makes for 
> complicated exception handling. RexVisitor doesn't support exceptions, so it 
> adds complication to handle them under those circumstances. I can't really 
> explain why it is necessary.
> Let me sketch out an alternative:
> 1. Construct the whole Expr tree without analyzing it
> 2. Any errors that happen during this process are not usually actionable by 
> the end user. It's good to have a descriptive error message, but it doesn't 
> mean there is something wrong with the SQL. I think that it is ok for this 
> code to throw subclasses of RuntimeException or use 
> Preconditions.checkState() with a good explanation.
> 3. When we get the Expr tree back in CreateExprVisitor::getExpr(), we call 
> analyze() on the root node, which does a recursive analysis of the whole tree.
> 4. The special Expr classes don't run analyze() in the constructor, don't 
> keep a reference to the Analyzer, and don't override resetAnalysisState(). 
> They override analyzeImpl() and they should be idempotent. The clone 
> constructor should not need to do anything special, just do a deep copy.
> I don't want to bog down this review. If we want to address this as a 
> followup, I can live with that, but I don't want us to go too far down this 
> road. (Or if we have a good explanation for why it is necessary, then we can 
> write a good comment and move on.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13151:
--

Assignee: Michael Smith

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13146:
--

Assignee: Joe McDonnell

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12967) Testcase fails at test_migrated_table_field_id_resolution due to "Table does not exist"

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853224#comment-17853224
 ] 

Joe McDonnell commented on IMPALA-12967:


There is a separate symptom where this test fails with a Disk I/O error. It is 
probably somewhat related, so we need to decide whether to include that symptom 
here. See IMPALA-13144.

> Testcase fails at test_migrated_table_field_id_resolution due to "Table does 
> not exist"
> ---
>
> Key: IMPALA-12967
> URL: https://issues.apache.org/jira/browse/IMPALA-12967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> Testcase test_migrated_table_field_id_resolution fails at exhaustive release 
> build with following messages:
> *Regression*
> {code:java}
> query_test.test_iceberg.TestIcebergTable.test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
>  "iceberg_migrated_alter_test_orc", "orc") common/file_utils.py:68: in 
> create_iceberg_table_from_directory file_format)) 
> common/impala_connection.py:215: in execute 
> fetch_profile_after_close=fetch_profile_after_close) 
> beeswax/impala_beeswax.py:191: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:384: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:  E   CAUSED BY: IcebergTableLoadingException: Table does not exist 
> at location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> Stacktrace
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test_orc", "orc")
> common/file_utils.py:68: in create_iceberg_table_from_directory
> file_format))
> common/impala_connection.py:215: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:ImpalaRuntimeException: Error making 'createTable' RPC to 
> Hive Metastore: 
> E   CAUSED BY: IcebergTableLoadingException: Table does not exist at 
> location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> {code}
> *Standard Error*
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_b59d79db` 
> CASCADE;
> -- 2024-04-02 00:56:55,137 INFO MainThread: Started query 
> f34399a8b7cddd67:031a3b96
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_b59d79db`;
> -- 2024-04-02 00:56:57,302 INFO MainThread: Started query 
> 94465af69907eac5:e33f17e0
> -- 2024-04-02 00:56:57,353 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_b59d79db" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up 

[jira] [Commented] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853223#comment-17853223
 ] 

Joe McDonnell commented on IMPALA-13144:


We need to decide whether we want to track this with IMPALA-12967 (which was 
originally about "Table does not exist at location" on the same test) or keep 
it separate.

> TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O 
> error
> --
>
> Key: IMPALA-13144
> URL: https://issues.apache.org/jira/browse/IMPALA-13144
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> A couple test jobs hit a failure on 
> TestIcebergTable.test_migrated_table_field_id_resolution:
> {noformat}
> query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
> vector, unique_database)
> common/impala_test_suite.py:725: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:216: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
> open HDFS file 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
> E   Error(2): No such file or directory
> E   Root cause: RemoteException: File does not exist: 
> /test-warehouse/iceberg_migrated_alter_test/00_0
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> E at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
> E at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> E at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13139:
---
Description: 
When debugging TestRestart, I noticed that the debug_action set for one query 
stayed in effect for subsequent queries that didn't specify query_options.
{noformat}
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

...

# debug_action is still set for these queries:
    self.execute_query_expect_success(self.client, "select age from 
{}".format(tbl_name))
self.execute_query_expect_success(self.client,
        "alter table {} add columns (name string)".format(tbl_name))
    self.execute_query_expect_success(self.client, "select name from 
{}".format(tbl_name)){noformat}
There is a way to clear the query options (self.client.clear_configuration()), 
but this is an odd behavior. It's unclear if some tests rely on this behavior.

> Query options set via ImpalaTestSuite::execute_query_expect_success stay set 
> for subsequent queries
> ---
>
> Key: IMPALA-13139
> URL: https://issues.apache.org/jira/browse/IMPALA-13139
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> When debugging TestRestart, I noticed that the debug_action set for one query 
> stayed in effect for subsequent queries that didn't specify query_options.
> {noformat}
>     DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
>                     .format(debug_action_sleep_time_sec * 1000))
>     query = "alter table {} add columns (age int)".format(tbl_name)
>     handle = self.execute_query_async(query, query_options={"debug_action": 
> DEBUG_ACTION})
> ...
> # debug_action is still set for these queries:
>     self.execute_query_expect_success(self.client, "select age from 
> {}".format(tbl_name))
> self.execute_query_expect_success(self.client,
>         "alter table {} add columns (name string)".format(tbl_name))
>     self.execute_query_expect_success(self.client, "select name from 
> {}".format(tbl_name)){noformat}
> There is a way to clear the query options 
> (self.client.clear_configuration()), but this is an odd behavior. It's 
> unclear if some tests rely on this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13139:
--

 Summary: Query options set via 
ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
 Key: IMPALA-13139
 URL: https://issues.apache.org/jira/browse/IMPALA-13139
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-06 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852961#comment-17852961
 ] 

Joe McDonnell commented on IMPALA-12616:


This is looking timing-related. I was able to get this to pass by adjusting 
some of the sleep times. Basically, it looks like the catalog is slower on s3 
and some operations don't finish in the time we thought they would.

 
{noformat}
    debug_action_sleep_time_sec = 10 (NEW: 30)
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

    # Wait a bit so the RPC from the catalogd arrives to the coordinator.
    time.sleep(0.5) (NEW: 5)

    self.cluster.catalogd.restart()

    # Wait for the query to finish.
    max_wait_time = (debug_action_sleep_time_sec
        + self.WAIT_FOR_CATALOG_UPDATE_TIMEOUT_SEC + 10)
    self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
max_wait_time){noformat}
A successful timeline looks like this:

 
 # Submit an alter table that sleeps before processing the catalog update
 # Sleep a little bit so the catalog knows about the alter table
 # Restart the catalogd
 # The catalog sends an update via the statestore. This has the new catalog ID 
and causes this message: "There was an error processing the impalad catalog 
update. Requesting a full topic update to recover: CatalogException: Detected 
catalog service ID changes from 9c9f7ff13f0e4f72:a896bee4d52fd37e to 
da67610b2c304198:a05daf1bc3d6a4b3. Aborting updateCatalog()"
 # The catalogd sends a full topic update
 # The alter table wakes up and prints this message: Catalog service ID 
mismatch. Current ID: da67610b2c304198:a05daf1bc3d6a4b3. ID in response: 
9c9f7ff13f0e4f72:a896bee4d52fd37e. Catalogd may have been restarted. Waiting 
for new catalog update from statestore.
 # Either it times out or there are too many non-empty updates, and the alter 
table bails out with "W0506 22:42:10.316627 23066 impala-server.cc:2369] 
e14b23a22458ab75:6b269414] Ignoring catalog update result of catalog 
service ID 9c9f7ff13f0e4f72:a896bee4d52fd37e because it does not match with 
current catalog service ID da67610b2c304198:a05daf1bc3d6a4b3. The current 
catalog service ID may be stale (this may be caused by the catalogd having been 
restarted more than once) or newer than the catalog service ID of the update 
result."

If the alter table wakes up from its sleep before #5 happens, the alter table 
will see the catalog service ID change and fail. To avoid that, we adjust the 
WAIT_BEFORE_PROCESSING_CATALOG_UPDATE higher. I also lengthened the sleep in #2 
to give the initial catalog some extra time to hear about the alter table. The 
test verifies that the logs contain the expected messages, so this should be a 
safe modification to the test.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13132) Ozone jobs see intermittent termination of Ozone manager / HMS fails to start

2024-06-04 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13132:
--

 Summary: Ozone jobs see intermittent termination of Ozone manager 
/ HMS fails to start
 Key: IMPALA-13132
 URL: https://issues.apache.org/jira/browse/IMPALA-13132
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Ozone jobs load data/metadata snapshots during dataload, then restarts the 
cluster. On this restart, the HMS sometimes fails to come up:
{noformat}
16:04:13  --> Starting Hive Metastore Service
16:04:13 No handlers could be found for logger "thrift.transport.TSocket"
16:04:14 Waiting for the Metastore at localhost:9083...
...
16:09:14 Waiting for the Metastore at localhost:9083...
16:09:14 Metastore service failed to start within 300.0 seconds.{noformat}
In the metastore logs, we see messages like this:
{noformat}
2024-06-04T08:37:06,425  INFO [main] retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
hostname/127.0.0.1 to localhost:9862 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
$Proxy31.submitRequest over nodeId=null,nodeAddress=localhost:9862 after 1 
failover attempts. Trying to failover after sleeping for 4000ms.{noformat}
It's trying to talk to the Ozone manager. The Ozone cluster was back up and 
running before trying to start the HMS, but then the Ozone manager received a 
signal and shutdown:
{noformat}
24/06/04 08:36:37 ERROR om.OzoneManagerStarter: RECEIVED SIGNAL 15: SIGTERM
24/06/04 08:36:37 INFO om.OzoneManagerStarter: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down OzoneManager at hostname/127.0.0.1
/
24/06/04 08:36:37 INFO om.OzoneManager: om1[localhost:9862]: Stopping Ozone 
Manager{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-04 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851904#comment-17851904
 ] 

Joe McDonnell commented on IMPALA-12616:


I switched the code to use self.client.wait_for_finished_timeout(), which will 
stop if it reaches either FINISHED or EXCEPTION. Here is the error it hits:
{noformat}
custom_cluster/test_restart_services.py:238: in 
test_restart_catalogd_while_handling_rpc_response_with_timeout
finished = self.client.wait_for_finished_timeout(handle, max_wait_time)
common/impala_connection.py:247: in wait_for_finished_timeout
operation_handle.get_handle(), timeout)
beeswax/impala_beeswax.py:423: in wait_for_finished_timeout
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:CatalogException: Detected catalog service ID changes from 
b0019607521f4f0a:8340b9882af1a856 to a4f8584219b34182:9b3cf9af859a0d54. 
Aborting updateCatalog(){noformat}

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-03 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851840#comment-17851840
 ] 

Joe McDonnell commented on IMPALA-12616:


This is now failing pretty consistently on a variety of s3 jobs (but only s3 
jobs). I think the first thing we could do is modify wait_for_any_state() to 
detect the terminal state (EXCEPTION) and print the error. In general, it would 
be good for wait_for_state() to know about terminal states.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851831#comment-17851831
 ] 

Joe McDonnell commented on IMPALA-13128:


It looks intermittent, so adding "flaky" label

> disk-file-test hangs on ARM + UBSAN test jobs
> -
>
> Key: IMPALA-13128
> URL: https://issues.apache.org/jira/browse/IMPALA-13128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
> this being the last output:
> {noformat}
> 23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   
> 43.42 sec
> 23:07:30 Start  64: disk-file-test
> 23:07:30 
> 18:47:00 
> 18:47:00  run-all-tests.sh TIMED OUT! {noformat}
> This has happened multiple times, but it looks limited to ARM + UBSAN. The 
> jobs take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13128:
--

 Summary: disk-file-test hangs on ARM + UBSAN test jobs
 Key: IMPALA-13128
 URL: https://issues.apache.org/jira/browse/IMPALA-13128
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
this being the last output:
{noformat}
23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   43.42 
sec
23:07:30 Start  64: disk-file-test
23:07:30 
18:47:00 
18:47:00  run-all-tests.sh TIMED OUT! {noformat}
This has happened multiple times, but it looks limited to ARM + UBSAN. The jobs 
take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13128:
---
Labels: broken-build flaky  (was: broken-build)

> disk-file-test hangs on ARM + UBSAN test jobs
> -
>
> Key: IMPALA-13128
> URL: https://issues.apache.org/jira/browse/IMPALA-13128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
> this being the last output:
> {noformat}
> 23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   
> 43.42 sec
> 23:07:30 Start  64: disk-file-test
> 23:07:30 
> 18:47:00 
> 18:47:00  run-all-tests.sh TIMED OUT! {noformat}
> This has happened multiple times, but it looks limited to ARM + UBSAN. The 
> jobs take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13127.

Fix Version/s: Not Applicable
   Resolution: Duplicate

Fixed by followup change in IMPALA-13040, closing as duplicate.

> custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs
> -
>
> Key: IMPALA-13127
> URL: https://issues.apache.org/jira/browse/IMPALA-13127
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Not Applicable
>
>
> ASAN jobs have been intermittently hitting a failure in 
> custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
> {noformat}
> custom_cluster/test_runtime_filter_aggregation.py:129: in 
> test_late_query_state_init
>     self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
> common/impala_test_suite.py:1383: in assert_log_contains
>     ", but found none." % (log_file_path, line_regex)
> E   AssertionError: Expected at least one line in file 
> /data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
>  matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
> but found none.{noformat}
> Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
> specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13127) custom_cluster/test_runtime_filter_aggregation.py is failing on ASAN jobs

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13127:
--

 Summary: custom_cluster/test_runtime_filter_aggregation.py is 
failing on ASAN jobs
 Key: IMPALA-13127
 URL: https://issues.apache.org/jira/browse/IMPALA-13127
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


ASAN jobs have been intermittently hitting a failure in 
custom_cluster.test_runtime_filter_aggregation.TestLateQueryStateInit.test_late_query_state_init():
{noformat}
custom_cluster/test_runtime_filter_aggregation.py:129: in 
test_late_query_state_init
    self.assert_log_contains('impalad_node1', 'INFO', log_pattern, expected)
common/impala_test_suite.py:1383: in assert_log_contains
    ", but found none." % (log_file_path, line_regex)
E   AssertionError: Expected at least one line in file 
/data0/jenkins/workspace/impala-cdwh-2024.0.18.0-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-077e.vpc.cloudera.com.jenkins.log.INFO.20240603-025918.3562162
 matching regex 'UpdateFilterFromRemote RPC called with remaining wait time', 
but found none.{noformat}
Seen on an ARM job and an x86_64 job, so it is probably not an architecture 
specific thing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13125) Set of tests for exploration_strategy=exhaustive varies between python 2 and 3

2024-06-03 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13125:
--

 Summary: Set of tests for exploration_strategy=exhaustive varies 
between python 2 and 3
 Key: IMPALA-13125
 URL: https://issues.apache.org/jira/browse/IMPALA-13125
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


TLDR: Python 3 runs a different set of exhaustive tests than Python 2.

Longer version:

When looking into running Python 3 tests, I noticed that the set of tests 
running for the exhaustive tests is different for Python 2 vs Python 3. This 
was surprising.

It turns out there is a distinction between run-tests.py's 
--exploration_strategy=exhaustive vs the 
--workload_exploration_strategy="functional-query:exhaustive" option. The 
exhaustive job is actually doing the latter. This means that individual 
function-query workload classes see cls.exploration_strategy() == "exhaustive", 
but the logic that generates the test vector still see 
exploration_strategy=core and it still uses pairwise generation. Code:
{noformat}
    if exploration_strategy == 'exhaustive':
      return self.__generate_exhaustive_combinations()
    elif exploration_strategy in ['core', 'pairwise']:
      return self.__generate_pairwise_combinations(){noformat}
[https://github.com/apache/impala/blob/master/tests/common/test_vector.py#L165-L168]

Python 2 vs 3 changes the way dictionaries work, impacting the order of test 
dimensions and how it picks tests. So, the Python 3 exhaustive tests are 
different. This may expose latent bugs, because some combinations that meet the 
constraints are never actually run (e.g. some json encodings don't have the 
decimal_tiny table).

We can work to make them behave similarly, using pytest's --collect-only option 
to look at the differences (and compare them to actual existing runs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13124) Migrate tests that use the 'unittest' package to use normal pytest base class

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13124:
--

 Summary: Migrate tests that use the 'unittest' package to use 
normal pytest base class
 Key: IMPALA-13124
 URL: https://issues.apache.org/jira/browse/IMPALA-13124
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Some tests use the 'unittest' package to be the base class of their tests. 
These can be run by pytest, but when running the tests with python 3, they fail 
with this message:
{noformat}
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/runner.py:150:
 in __init__
    self.result = func()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in _memocollect
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:315:
 in _memoizedcall
    res = function()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/main.py:435:
 in 
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:605:
 in collect
    return super(Module, self).collect()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:459:
 in collect
    res = self.makeitem(name, obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/python.py:471:
 in makeitem
    collector=self, name=name, obj=obj)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:724:
 in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:338:
 in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:333:
 in 
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:595:
 in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
../infra/python/env-gcc10.4.0-py3/lib/python3.7/site-packages/_pytest/vendored_packages/pluggy.py:249:
 in _wrapped_call
    wrap_controller.send(call_outcome)
E   RuntimeError: generator raised StopIteration{noformat}
Converting them to use the regular pytest base classes works fine with python 3 
(and also python 2).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13123) Add a way to run tests with python 3

2024-06-02 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13123:
--

 Summary: Add a way to run tests with python 3
 Key: IMPALA-13123
 URL: https://issues.apache.org/jira/browse/IMPALA-13123
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


As a first step towards switching to python 3, we need an option to run the 
tests using the toolchain python 3. For example, there could be an environment 
variable that tells tests/run-tests.py and bin/impala-py.test to use python 3.

This can be combined with a first round of fixes to get a decent number of 
tests running and see what is broken. The fixes must be compatible with python 
2, and the default will still be python 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12686) Build the toolchain with basic debug information (-g1)

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12686:
--

Assignee: Joe McDonnell

> Build the toolchain with basic debug information (-g1)
> --
>
> Key: IMPALA-12686
> URL: https://issues.apache.org/jira/browse/IMPALA-12686
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Currently, we build most of the toolchain without debug information and 
> without "-fno-omit-frame-pointers". This makes it difficult to get reliable 
> stack traces that go through some of those libraries. We should build the 
> toolchain with basic debug information (-g1) to get reliable stack traces.
> For some libraries, we want to compile with full debug information (-g) to 
> allow the ability to step through the code with a debugger. Currently, ORC 
> and Kudu (and others) are built with -g and should stay that way. We should 
> add -g for Thrift.
> To save space, we should also enable compressed debug information (-gz) to 
> keep the sizes from growing too much (and reduce the size of existing debug 
> information).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13057.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Incorporate tuple/slot information into the tuple cache key
> ---
>
> Key: IMPALA-13057
> URL: https://issues.apache.org/jira/browse/IMPALA-13057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Since the tuple and slot information is kept separately in the descriptor 
> table, it does not get incorporated into the PlanNode thrift used for the 
> tuple cache key. This means that the tuple cache can't distinguish between 
> these two queries:
> {noformat}
> select int_col1 from table;
> select int_col2 from table;{noformat}
> To solve this, the tuple/slot information needs to be incorporated into the 
> cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
> place to serialize the TupleDescriptor/SlotDescriptors and incorporate it 
> into the hash.
> The tuple ids and slot ids are global ids, so the value is influenced by the 
> entirety of the query. This is a problem for matching cache results across 
> different queries. As part of incorporating the tuple/slot information, we 
> should also add an ability to translate tuple/slot ids into ids local to a 
> subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13072.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Toolchain: Add retries for uploading artifacts to the s3 buckets
> 
>
> Key: IMPALA-13072
> URL: https://issues.apache.org/jira/browse/IMPALA-13072
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
> {noformat}
> 22:17:06 impala-toolchain-redhat8: Uploading 
> /mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
> Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
> Since we do many uploads, even a relatively low failure rate can make it hard 
> to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13073) Toolchain builds should pass VERBOSE=1 into make

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13073:
--

Assignee: Joe McDonnell

> Toolchain builds should pass VERBOSE=1 into make
> 
>
> Key: IMPALA-13073
> URL: https://issues.apache.org/jira/browse/IMPALA-13073
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> It is useful to be able to examine the compilation flags for toolchain 
> components. Sometimes we want to add --fno-omit-frame-pointers or add debug 
> symbols with -g1 and verify that it actually gets set. For projects that use 
> CMake, the output often does not print the compile command. CMake can produce 
> a compilation database, but it is simpler to have make print the compilation 
> command by adding VERBOSE=1. The output isn't that big and output gets 
> redirected to a file, so it seems like we could leave it on by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-31 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851229#comment-17851229
 ] 

Joe McDonnell commented on IMPALA-13072:


Fixed by this commit:
{noformat}
commit f601ec33f2bcfaab19a46cff5fc6f0a90e22da8d
Author: Joe McDonnell 
Date:   Fri May 10 17:22:56 2024 -0700    IMPALA-13072: Add retries for s3 
uploads to combat flakiness
    
    On ARM toolchain builds, we have seen some uploads to s3 fail
    with a segementation fault. Given the number of artifacts that
    the toolchain uploads, even a relatively low error rate can
    make it hard to get a passing build. This modifies the s3
    upload code to retry up to 10 times to avoid this flakiness.
    
    Testing:
     - Ran an ARM toolchain build and saw the retry happen
       successfully
     - Ran a toolchain build with an invalid s3 bucket and verified
       it failed after 10 retries
    
    Change-Id: I95d858c99e965730303c2bfd90478ac5f68acf83
    Reviewed-on: http://gerrit.cloudera.org:8080/21421
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell 
{noformat}

> Toolchain: Add retries for uploading artifacts to the s3 buckets
> 
>
> Key: IMPALA-13072
> URL: https://issues.apache.org/jira/browse/IMPALA-13072
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
> {noformat}
> 22:17:06 impala-toolchain-redhat8: Uploading 
> /mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
> Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
> Since we do many uploads, even a relatively low failure rate can make it hard 
> to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13111.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13121:
--

Assignee: Joe McDonnell

> Move the toolchain to a newer version of ccache
> ---
>
> Key: IMPALA-13121
> URL: https://issues.apache.org/jira/browse/IMPALA-13121
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
> debug info, I ran into a case where the debug level was not what I expected. 
> I had added a -g0 at the end to turn off debug information for the cmake 
> build, but it still ended up with debug info.
> The release notes for ccache 3.3.5 says this:
>  * Fixed a regression where the original order of debug options could be 
> lost. This reverts the “Improved parsing of {{-g*}} options” feature in 
> ccache 3.3.
> [https://ccache.dev/releasenotes.html#_ccache_3_3_5]
> I think I may have been hitting that. We should upgrade ccache to a more 
> recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13111:
--

Assignee: Joe McDonnell

> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13121) Move the toolchain to a newer version of ccache

2024-05-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13121:
--

 Summary: Move the toolchain to a newer version of ccache
 Key: IMPALA-13121
 URL: https://issues.apache.org/jira/browse/IMPALA-13121
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The native-toolchain currently uses ccache 3.3.3. In a recent change adding 
debug info, I ran into a case where the debug level was not what I expected. I 
had added a -g0 at the end to turn off debug information for the cmake build, 
but it still ended up with debug info.

The release notes for ccache 3.3.5 says this:
 * Fixed a regression where the original order of debug options could be lost. 
This reverts the “Improved parsing of {{-g*}} options” feature in ccache 3.3.

[https://ccache.dev/releasenotes.html#_ccache_3_3_5]

I think I may have been hitting that. We should upgrade ccache to a more recent 
version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-28 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13111:
--

 Summary: impala-gdb.py's find-query-ids/find-fragment-instances 
return unusable query ids
 Key: IMPALA-13111
 URL: https://issues.apache.org/jira/browse/IMPALA-13111
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
information about the queries / fragments running in a core file. However, the 
query/fragment ids that it returns have issues with the signedness of the 
integers:
{noformat}
(gdb) find-fragment-instances
Fragment Instance Id    Thread IDs
-23b76c1699a831a1:279358680036    [117120]
-23b76c1699a831a1:279358680037    [117121]
-23b76c1699a831a1:279358680038    [117122]
..

(gdb) find-query-ids
-3cbda1606b3ade7c:f170c4bd
-23b76c1699a831a1:27935868
68435df1364aa90f:1752944f
3442ed6354c7355d:78c83d20{noformat}
The low values for find-query-ids don't have this problem, because it is ANDed 
with 0x:
{noformat}
            qid_low = format(int(qid_low, 16) & 0x, 
'x'){noformat}
We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13020:
--

Assignee: Joe McDonnell

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13020.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13082) Use separate versions for jackson-databind vs jackson-core, etc.

2024-05-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13082:
--

 Summary: Use separate versions for jackson-databind vs 
jackson-core, etc.
 Key: IMPALA-13082
 URL: https://issues.apache.org/jira/browse/IMPALA-13082
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


We have a single jackson-databind.version property defined populated by the 
IMPALA_JACKSON_DATABIND_VERSION. This currently sets the version for 
jackson-databind as well as other jackson libraries like jackson-core.

Sometimes there is a jackson-databind patch release without a release of other 
jackson libraries. For example, there is a jackson-databind 2.12.7.1, but there 
is no jackson-core 2.12.7.1. There is only jackson-core 2.12.7. To handle these 
patch scenarios, it is useful to split out the jackson-databind version from 
the version for other jackson libraries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13073) Toolchain builds should pass VERBOSE=1 into make

2024-05-11 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13073:
--

 Summary: Toolchain builds should pass VERBOSE=1 into make
 Key: IMPALA-13073
 URL: https://issues.apache.org/jira/browse/IMPALA-13073
 Project: IMPALA
  Issue Type: Improvement
Reporter: Joe McDonnell


It is useful to be able to examine the compilation flags for toolchain 
components. Sometimes we want to add --fno-omit-frame-pointers or add debug 
symbols with -g1 and verify that it actually gets set. For projects that use 
CMake, the output often does not print the compile command. CMake can produce a 
compilation database, but it is simpler to have make print the compilation 
command by adding VERBOSE=1. The output isn't that big and output gets 
redirected to a file, so it seems like we could leave it on by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13072) Toolchain: Add retries for uploading artifacts to the s3 buckets

2024-05-11 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13072:
--

 Summary: Toolchain: Add retries for uploading artifacts to the s3 
buckets
 Key: IMPALA-13072
 URL: https://issues.apache.org/jira/browse/IMPALA-13072
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


On ARM toolchain builds, we have seen some failures to upload tarballs to s3:
{noformat}
22:17:06 impala-toolchain-redhat8: Uploading 
/mnt/build/llvm-5.0.1-asserts-p7-gcc-10.4.0.tar.gz to 
s3://native-toolchain/build/33-f93e2c9a86/llvm/5.0.1-asserts-p7-gcc-10.4.0/llvm-5.0.1-asserts-p7-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
22:17:06 impala-toolchain-redhat8: /mnt/functions.sh: line 385: 680012 
Segmentation fault      (core dumped) aws s3 cp --only-show-errors 
"${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"{noformat}
Since we do many uploads, even a relatively low failure rate can make it hard 
to get a passing build. We should change the code to retry the upload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13068) Add tests for integration with dbt

2024-05-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13068:
--

 Summary: Add tests for integration with dbt
 Key: IMPALA-13068
 URL: https://issues.apache.org/jira/browse/IMPALA-13068
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Some Impala users rely on dbt and dbt's integration with Impala for their 
workloads. It would be useful to have some basic tests / scripts for running 
dbt against Impala. This provides a smoke test for functionality. It also makes 
it easier for developers to debug dbt issues locally, as the development 
environment would already have dbt set up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10451) TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive to have HIVE-24157

2024-05-09 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10451.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive 
> to have HIVE-24157
> ---
>
> Key: IMPALA-10451
> URL: https://issues.apache.org/jira/browse/IMPALA-10451
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> TestAvroSchemaResolution.test_avro_schema_resolution recently fails when 
> building against a Hive version with HIVE-24157.
> {code:java}
> query_test.test_avro_schema_resolution.TestAvroSchemaResolution.test_avro_schema_resolution[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> avro/snap/block] (from pytest)
> query_test/test_avro_schema_resolution.py:36: in test_avro_schema_resolution
>  self.run_test_case('QueryTest/avro-schema-resolution', vector, 
> unique_database)
> common/impala_test_suite.py:690: in run_test_case
>  self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
>  replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
>  VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
>  assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E 10 != 0 
> {code}
> The failed query is
> {code:sql}
> select count(*) from functional_avro_snap.avro_coldef {code}
> The cause is that data loading for avro_coldef failed. The DML is
> {code:sql}
> INSERT OVERWRITE TABLE avro_coldef PARTITION(year=2014, month=1)
> SELECT bool_col, tinyint_col, smallint_col, int_col, bigint_col,
> float_col, double_col, date_string_col, string_col, timestamp_col
> FROM (select * from functional.alltypes order by id limit 5) a;
> {code}
> The failure (found in HS2) is:
> {code}
> 2021-01-24T01:52:16,340 ERROR [9433ee64-d706-4fa4-a146-18d71bf17013 
> HiveServer2-Handler-Pool: Thread-4946] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting DATE/TIMESTAMP 
> types to NUMERIC is prohibited (hive.strict.timestamp.conversion)
>  at 
> org.apache.hadoop.hive.ql.udf.TimestampCastRestrictorResolver.getEvalMethod(TimestampCastRestrictorResolver.java:62)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:168)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:260)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:292)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDescWithUdfData(TypeCheckProcFactory.java:987)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.createConversionCast(ParseUtils.java:163)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:8551)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7908)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11100)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10972)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11901)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11771)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> 

[jira] [Created] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-06 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13057:
--

 Summary: Incorporate tuple/slot information into the tuple cache 
key
 Key: IMPALA-13057
 URL: https://issues.apache.org/jira/browse/IMPALA-13057
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Since the tuple and slot information is kept separately in the descriptor 
table, it does not get incorporated into the PlanNode thrift used for the tuple 
cache key. This means that the tuple cache can't distinguish between these two 
queries:
{noformat}
select int_col1 from table;
select int_col2 from table;{noformat}
To solve this, the tuple/slot information needs to be incorporated into the 
cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
place to serialize the TupleDescriptor/SlotDescriptors and incorporate it into 
the hash.

The tuple ids and slot ids are global ids, so the value is influenced by the 
entirety of the query. This is a problem for matching cache results across 
different queries. As part of incorporating the tuple/slot information, we 
should also add an ability to translate tuple/slot ids into ids local to a 
subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13049) Add dependency management for the log4j2 version

2024-05-01 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13049.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add dependency management for the log4j2 version
> 
>
> Key: IMPALA-13049
> URL: https://issues.apache.org/jira/browse/IMPALA-13049
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> In some internal builds, we see cases where one dependency brings in one 
> version of log4j2 and another brings in a different version on a different 
> artifact. In particular, we have seen cases where Hive brings in log4j-api 
> 2.17.1 while something else brings in log4j-core 2.18.0. This is a bad 
> combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class 
> existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result 
> in class not found exceptions.
> Impala itself uses reload4j rather than log4j2, so this is purely about 
> coordinating dependencies rather than Impala code.
> We should add dependency management for log4j-api and log4j-core. It makes 
> sense to standardize on 2.18.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13049) Add dependency management for the log4j2 version

2024-04-30 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13049:
--

 Summary: Add dependency management for the log4j2 version
 Key: IMPALA-13049
 URL: https://issues.apache.org/jira/browse/IMPALA-13049
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend, Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


In some internal builds, we see cases where one dependency brings in one 
version of log4j2 and another brings in a different version on a different 
artifact. In particular, we have seen cases where Hive brings in log4j-api 
2.17.1 while something else brings in log4j-core 2.18.0. This is a bad 
combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class 
existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result in 
class not found exceptions.

Impala itself uses reload4j rather than log4j2, so this is purely about 
coordinating dependencies rather than Impala code.

We should add dependency management for log4j-api and log4j-core. It makes 
sense to standardize on 2.18.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-04-18 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838830#comment-17838830
 ] 

Joe McDonnell commented on IMPALA-13020:


[~stigahuang] pointed out that the exception is not being printed due to this 
logic in TAcceptQueueServer.cpp (see 
[https://github.com/apache/impala/blob/master/be/src/rpc/TAcceptQueueServer.cpp#L91]
 ):
{noformat}
    } catch (const TTransportException& ttx) {
      if (ttx.getType() != TTransportException::END_OF_FILE) {
        string errStr = string("TAcceptQueueServer client died: ") + ttx.what();
        GlobalOutput(errStr.c_str());
      }{noformat}
The MaxMessageSize exception uses END_OF_FILE, so it doesn't get printed. If we 
print all exceptions, we get:
{noformat}
I0418 18:39:55.026305 3241592 thrift-util.cc:198] TAcceptQueueServer client 
died: MaxMessageSize reached{noformat}

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-04-18 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838828#comment-17838828
 ] 

Joe McDonnell commented on IMPALA-13020:


One way to make the catalog-update topic large is to create tables with many 
partitions. Example:
{noformat}
set mt_dop=20;
create table {0} partitioned by (ss_sold_time_sk) as select ss_item_sk, 
ss_sold_time_sk from tpcds.store_sales;{noformat}
Each of these tables adds about 25MB to the catalog-update topic, so getting to 
about 90 of these tables puts the topic well above 2GB.

This is resource intensive and needs a sizeable machine. Starting the cluster 
with a single impalad (i.e. bin/start-impala-cluster.py -s 1) helps keep the 
memory footprint down. There are easier / less resource intensive ways to do 
this.

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-04-18 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13020:
--

 Summary: catalog-topic updates >2GB do not work due to Thrift's 
max message size
 Key: IMPALA-13020
 URL: https://issues.apache.org/jira/browse/IMPALA-13020
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


Thrift 0.16.0 added a max message size to protect against malicious packets 
that can consume a large amount of memory on the receiver side. This max 
message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
via thrift_rpc_max_message_size).

In catalog v1, the catalog-update statestore topic can become larger than 2GB 
when there are a large number of tables / partitions / files. If this happens 
and an Impala coordinator needs to start up (or needs a full topic update for 
any other reason), it is expecting the statestore to send it the full topic 
update, but the coordinator actually can't process the message. The 
deserialization of the message hits the 2GB max message size limit and fails.

On the statestore side, it shows this message:
{noformat}
I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
catalog-update topic update for impa...@mcdonnellthrift.vpc.cloudera.com:27000. 
Size = 2.27 GB
I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
send() : Broken pipe
I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
client for mcdonnellthrift.vpc.cloudera.com:23000
I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
send() : Broken pipe
I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
pipe)
I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
send() : Broken pipe
I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
rpc: N6impala20TUpdateStateResponseE, send: not done
I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
On the Impala side, it doesn't give a good error, but we see this:
{noformat}
I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
server StatestoreSubscriber from client 
I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog to 
be initialized, attempt: 110
I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog to 
be initialized, attempt: 111
I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog to 
be initialized, attempt: 112
I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog to 
be initialized, attempt: 113{noformat}
With a patch Thrift that allows an int64_t max message size and setting that to 
a larger value, the Impala was able to start up (even without restarting the 
statestored).

Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
use to enforce this limit, so this is something we should fix to avoid upgrade 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks

2024-04-18 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13015:
--

Assignee: Yida Wu

> Dataload fails due to concurrency issue with test.jceks
> ---
>
> Key: IMPALA-13015
> URL: https://issues.apache.org/jira/browse/IMPALA-13015
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Yida Wu
>Priority: Major
>  Labels: flaky
>
> When doing dataload locally, it fails with this error:
> {noformat}
> Traceback (most recent call last):
>   File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in 
> 
>     if __name__ == "__main__": main()
>   File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in 
> main
>     os.remove(jceks_path)
> OSError: [Errno 2] No such file or directory: 
> '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
> Background task Loading functional-query data (pid 501094) failed.
> {noformat}
> testdata/bin/create-load-data.sh calls bin/load-data.py for functional, 
> TPC-H, and TPC-DS in parallel, so this logic has race conditions:
> {noformat}
>   jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
>   if os.path.exists(jceks_path):
>     os.remove(jceks_path){noformat}
> I don't see a specific reason for this to be in bin/load-data.py. It should 
> be moved somewhere else that doesn't run in parallel. One possible location 
> is to add a step in testdata/bin/create-load-data.sh
> This was introduced in 
> [https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks

2024-04-18 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13015:
---
Labels: flaky  (was: )

> Dataload fails due to concurrency issue with test.jceks
> ---
>
> Key: IMPALA-13015
> URL: https://issues.apache.org/jira/browse/IMPALA-13015
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Major
>  Labels: flaky
>
> When doing dataload locally, it fails with this error:
> {noformat}
> Traceback (most recent call last):
>   File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in 
> 
>     if __name__ == "__main__": main()
>   File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in 
> main
>     os.remove(jceks_path)
> OSError: [Errno 2] No such file or directory: 
> '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
> Background task Loading functional-query data (pid 501094) failed.
> {noformat}
> testdata/bin/create-load-data.sh calls bin/load-data.py for functional, 
> TPC-H, and TPC-DS in parallel, so this logic has race conditions:
> {noformat}
>   jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
>   if os.path.exists(jceks_path):
>     os.remove(jceks_path){noformat}
> I don't see a specific reason for this to be in bin/load-data.py. It should 
> be moved somewhere else that doesn't run in parallel. One possible location 
> is to add a step in testdata/bin/create-load-data.sh
> This was introduced in 
> [https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks

2024-04-18 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13015:
--

 Summary: Dataload fails due to concurrency issue with test.jceks
 Key: IMPALA-13015
 URL: https://issues.apache.org/jira/browse/IMPALA-13015
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


When doing dataload locally, it fails with this error:
{noformat}
Traceback (most recent call last):
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in 

    if __name__ == "__main__": main()
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main
    os.remove(jceks_path)
OSError: [Errno 2] No such file or directory: 
'/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
Background task Loading functional-query data (pid 501094) failed.
{noformat}
testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, 
and TPC-DS in parallel, so this logic has race conditions:
{noformat}
  jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
  if os.path.exists(jceks_path):
    os.remove(jceks_path){noformat}
I don't see a specific reason for this to be in bin/load-data.py. It should be 
moved somewhere else that doesn't run in parallel. One possible location is to 
add a step in testdata/bin/create-load-data.sh

This was introduced in 
[https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12689.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12689:
--

Assignee: Joe McDonnell

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838398#comment-17838398
 ] 

Joe McDonnell commented on IMPALA-12689:


Fixed by:
{noformat}
commit cd9260e5276d0e342b21869c51e71aea9643504c
Author: Joe McDonnell 
Date:   Thu Feb 15 18:22:15 2024 -0800    IMPALA-12689: Change TPC-H and TPC-DS 
builds to respect CFLAGS
    
    The TPC-H and TPC-DS builds currently do not respect the
    CFLAGS environment variable, so they don't incorporate the
    values that we set in init-compiler.sh.
    
    This modifies the build scripts for TPC-H and TPC-DS to
    patch their makefiles to add our CFLAGS. This has the
    side effect of turning on -O3 optimization, resulting
    in faster binaries used to generate the TPC-H and
    TPC-DS datasets:
    
    TPC-H's dbgen at scale 42:
    Unoptimized: 4m46.269s
    Optimized: 3m46.379s
    
    TPC-DS's dsdgen at scale 20:
    Unoptimized: 9m41.441s
    Optimized: 7m25.017s
    
    Testing:
     - Ran a build and verified that the flags include our
       CFLAGS value
    
    Change-Id: I3f999b71c56a72c14f1beeea99a3689b82a4d45a
    Reviewed-on: http://gerrit.cloudera.org:8080/2
    Reviewed-by: Michael Smith 
    Tested-by: Joe McDonnell 
{noformat}

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838396#comment-17838396
 ] 

Joe McDonnell commented on IMPALA-12900:


Fixed by
{noformat}
commit ce8bc71ff7e0cfc0c39511308df8841f39fb03d2
Author: Joe McDonnell 
Date:   Tue Mar 12 21:47:12 2024 -0700    IMPALA-12900: Build binutils with -O3
    
    The binutils build happens before we have switched over to
    using the toolchain compiler. This means that it also does
    not set CFLAGS/CXXFLAGS. The default optimization level
    for binutils is -O2. It is possible that we could get a bit
    extra speed by using -O3, so this sets CFLAGS/CXXFLAGS to use
    -O3 for binutils.
    
    Testing:
     - Toolchain builds on x86_64 and ARM
    
    Change-Id: I2e75db0759b4d3d4e6cc2ce929b1741808f1b771
    Reviewed-on: http://gerrit.cloudera.org:8080/21145
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell {noformat}

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12900:
--

Assignee: Joe McDonnell

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12900.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12905) Implement disk-based tuple caching

2024-04-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12905:
--

Assignee: Joe McDonnell

> Implement disk-based tuple caching
> --
>
> Key: IMPALA-12905
> URL: https://issues.apache.org/jira/browse/IMPALA-12905
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The TupleCacheNode caches tuples to be reused later for equivalent queries. 
> This tracks implementing a version that serializes tuples and stores them as 
> files on local disk. 
> This will have a few parts:
>  # There is a TupleCacheMgr that keeps track of what entries exist in the 
> cache and evicts entries as needed to make space for new entries. This will 
> be configured using startup flags to specify the directory, size, and cache 
> eviction policy.
>  # The TupleCacheNode will interact with the TupleCacheMgr to determine if 
> the entry is available. If it is, it reads the associated tuple cache file 
> and returns the RowBatches. If the entry does not exist, it reads RowBatches 
> from its child and stores them to a new file in the cache.
>  # The TupleReader / TupleWriter implement serialization / deserialization of 
> RowBatches to/from a local file. This uses the existing serialization used 
> for KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12905) Implement disk-based tuple caching

2024-04-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12905.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Implement disk-based tuple caching
> --
>
> Key: IMPALA-12905
> URL: https://issues.apache.org/jira/browse/IMPALA-12905
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> The TupleCacheNode caches tuples to be reused later for equivalent queries. 
> This tracks implementing a version that serializes tuples and stores them as 
> files on local disk. 
> This will have a few parts:
>  # There is a TupleCacheMgr that keeps track of what entries exist in the 
> cache and evicts entries as needed to make space for new entries. This will 
> be configured using startup flags to specify the directory, size, and cache 
> eviction policy.
>  # The TupleCacheNode will interact with the TupleCacheMgr to determine if 
> the entry is available. If it is, it reads the associated tuple cache file 
> and returns the RowBatches. If the entry does not exist, it reads RowBatches 
> from its child and stores them to a new file in the cache.
>  # The TupleReader / TupleWriter implement serialization / deserialization of 
> RowBatches to/from a local file. This uses the existing serialization used 
> for KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12975) Rework organization of Hadoop dependency on ARM builds

2024-04-05 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12975:
--

 Summary: Rework organization of Hadoop dependency on ARM builds
 Key: IMPALA-12975
 URL: https://issues.apache.org/jira/browse/IMPALA-12975
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Reporter: Joe McDonnell


The hadoop binaries that we download from the CDP build number are built for 
x86_64. On x86_64, HADOOP_LIB_DIR and HADOOP_INCLUDE_DIR point to the CDP 
hadoop (i.e. HADOOP_HOME/lib and HADOOP_HOME/include). Various pieces 
(including the C++ build) use these environment variables to find the native 
libraries.

On ARM, we leave those environment variables pointed to that same location. We 
fix things up by downloading a separate hadoop-client built for ARM, then 
copying the contents into the usual location in the CDP hadoop directory, 
overwriting the x86_64 contents. The code to overwrite the libraries runs on 
each invocation of buildall.sh

On ARM, we could change this to point HADOOP_LIB_DIR to the downloaded 
hadoop-client (which is built for ARM). With a bit of work on the 
hadoop-client, we could get it to also have the header files and also point 
HADOOP_INCLUDE_DIR to it. This avoids the need to copy files during 
buildall.sh. Any build that wants to pass in a custom hadoop can then use 
HADOOP_LIB_DIR_OVERRIDE and HADOOP_INCLUDE_DIR_OVERRIDE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-10263) Native toolchain support for cross compiling to produce ARM binaries

2024-04-05 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell closed IMPALA-10263.
--
Resolution: Won't Fix

Emulation is too slow and we now have ARM builds of the native-toolchain.

> Native toolchain support for cross compiling to produce ARM binaries
> 
>
> Key: IMPALA-10263
> URL: https://issues.apache.org/jira/browse/IMPALA-10263
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Priority: Major
>  Labels: arm
>
> With support for ARM added to upstream Impala, it would be useful to be able 
> to build the ARM native toolchain from an x86 machine. This would allow it to 
> be built and uploaded to s3 using the same infrastructure that currently 
> builds the x86 binaries. Having the ARM binaries in s3 opens up possibilities 
> to incorporate an ARM build into GVO.
> QEMU has the ability to emulate ARM on an x86 machine, and it is surprisingly 
> simple to get an ARM docker container running on x86. This article provides 
> some depth:
> [https://ownyourbits.com/2018/06/27/running-and-building-arm-docker-containers-in-x86/]
> The basic story is that the steps are:
>  # Install qemu-user/qemu-user-static (which installs appropriate hooks in 
> the kernel)
>  # Make qemu-aarch64-static available in the context for building the docker 
> container
>  # In the Dockerfile, copy qemu-aarch64-static into /usr/bin
> For example, here is the start of the ubuntu1804 Dockerfile:
> {noformat}
> FROM arm64v8/ubuntu:18.04
> COPY qemu-aarch64-static /usr/bin/qemu-aarch64-static
> # The rest of the dockerfile{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12697) Native toolchain upload to S3 silently failed

2024-03-28 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12697.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Native toolchain upload to S3 silently failed
> -
>
> Key: IMPALA-12697
> URL: https://issues.apache.org/jira/browse/IMPALA-12697
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> A native-toolchain build silently failed to upload to S3
> {code}
> 21:03:52 impala-toolchain-redhat8: Uploading 
> /mnt/build/orc-1.7.9-p10-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/11-8dbe785e9e/orc/1.7.9-p10-gcc-10.4.0/orc-1.7.9-p10-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 21:03:52 impala-toolchain-redhat8: /mnt/functions.sh: line 383: 1119230 
> Segmentation fault  (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"
> 21:03:52 impala-toolchain-redhat8: Uploading to mirror:
> 21:03:52 impala-toolchain-redhat8: Uploading 
> /mnt/build/orc-1.7.9-p10-gcc-10.4.0.tar.gz to 
> s3://native-toolchain-us-west-2/build/11-8dbe785e9e/orc/1.7.9-p10-gcc-10.4.0/orc-1.7.9-p10-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 21:03:52 impala-toolchain-redhat8: Cleaning /mnt/source/orc ...
> {code}
> It should have failed the build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12697) Native toolchain upload to S3 silently failed

2024-03-28 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831972#comment-17831972
 ] 

Joe McDonnell commented on IMPALA-12697:


Merged the fix:
{noformat}
commit 78a3723ad8251fbd786c276bd0f6b75a6b5bae74
Author: Laszlo Gaal 
Date:   Thu Mar 7 19:35:43 2024 +0100    IMPALA-12697: Set FAIL_ON_PUBLISH to 
true by default
    
    Ensure that a package upload failure fails the complete build process
    instead of just producing an incomplete toolchain in silence.
    
    This changes the default initialized value of the FAIL_ON_PUBLISH
    environment variable to 1. This still allows Jenkins jobs to supply a
    different value if it ever becomes necessary.
    
    Change-Id: I156269e8b3b1fa5d743a8ab5a83810001f7dd648
    Reviewed-on: http://gerrit.cloudera.org:8080/21134
    Reviewed-by: Joe McDonnell 
    Tested-by: Joe McDonnell 
{noformat}

> Native toolchain upload to S3 silently failed
> -
>
> Key: IMPALA-12697
> URL: https://issues.apache.org/jira/browse/IMPALA-12697
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
>
> A native-toolchain build silently failed to upload to S3
> {code}
> 21:03:52 impala-toolchain-redhat8: Uploading 
> /mnt/build/orc-1.7.9-p10-gcc-10.4.0.tar.gz to 
> s3://native-toolchain/build/11-8dbe785e9e/orc/1.7.9-p10-gcc-10.4.0/orc-1.7.9-p10-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 21:03:52 impala-toolchain-redhat8: /mnt/functions.sh: line 383: 1119230 
> Segmentation fault  (core dumped) aws s3 cp --only-show-errors 
> "${PACKAGE_FINAL_TGZ}" "${PACKAGE_S3_DESTINATION}"
> 21:03:52 impala-toolchain-redhat8: Uploading to mirror:
> 21:03:52 impala-toolchain-redhat8: Uploading 
> /mnt/build/orc-1.7.9-p10-gcc-10.4.0.tar.gz to 
> s3://native-toolchain-us-west-2/build/11-8dbe785e9e/orc/1.7.9-p10-gcc-10.4.0/orc-1.7.9-p10-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz
> 21:03:52 impala-toolchain-redhat8: Cleaning /mnt/source/orc ...
> {code}
> It should have failed the build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12953) CentOS 8 builds fail with python ImportError: No module named RuntimeProfile.ttypes

2024-03-28 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12953.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> CentOS 8 builds fail with python ImportError: No module named 
> RuntimeProfile.ttypes
> ---
>
> Key: IMPALA-12953
> URL: https://issues.apache.org/jira/browse/IMPALA-12953
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Zoltán Borók-Nagy
>Assignee: Laszlo Gaal
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.4.0
>
>
> Saw the followings in CentOS 8 builds:
> {noformat}
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/bin/start-impala-cluster.py",
>  line 39, in 
> from tests.common.impala_cluster import (ImpalaCluster, 
> DEFAULT_BEESWAX_PORT,
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_cluster.py",
>  line 37, in 
> from tests.common.impala_service import (
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_service.py",
>  line 33, in 
> from tests.common.impala_connection import create_connection, 
> create_ldap_connection
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_connection.py",
>  line 31, in 
> from RuntimeProfile.ttypes import TRuntimeProfileFormat
> ImportError: No module named RuntimeProfile.ttypes{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12953) CentOS 8 builds fail with python ImportError: No module named RuntimeProfile.ttypes

2024-03-28 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831935#comment-17831935
 ] 

Joe McDonnell commented on IMPALA-12953:


After copying the avro tarball over to native-toolchain-us-west-2, it gets past 
the build, so I'm going to go ahead and resolve this.

> CentOS 8 builds fail with python ImportError: No module named 
> RuntimeProfile.ttypes
> ---
>
> Key: IMPALA-12953
> URL: https://issues.apache.org/jira/browse/IMPALA-12953
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Zoltán Borók-Nagy
>Assignee: Laszlo Gaal
>Priority: Major
>  Labels: broken-build
>
> Saw the followings in CentOS 8 builds:
> {noformat}
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/bin/start-impala-cluster.py",
>  line 39, in 
> from tests.common.impala_cluster import (ImpalaCluster, 
> DEFAULT_BEESWAX_PORT,
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_cluster.py",
>  line 37, in 
> from tests.common.impala_service import (
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_service.py",
>  line 33, in 
> from tests.common.impala_connection import create_connection, 
> create_ldap_connection
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_connection.py",
>  line 31, in 
> from RuntimeProfile.ttypes import TRuntimeProfileFormat
> ImportError: No module named RuntimeProfile.ttypes{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12953) CentOS 8 builds fail with python ImportError: No module named RuntimeProfile.ttypes

2024-03-28 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831917#comment-17831917
 ] 

Joe McDonnell commented on IMPALA-12953:


This is failing to download avro from the toolchain:
{noformat}
13:25:07 2024-03-26 13:25:07,262 Thread-4 ERROR: Download failed; retrying 
after sleep: Command '['wget', '-q', 
'https://native-toolchain-us-west-2.s3.us-west-2.amazonaws.com/build/25-051b912729/avro/1.7.4-p5-gcc-10.4.0/avro-1.7.4-p5-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz',
 
'--output-document=/data/jenkins/workspace/impala-asf-master-core-release-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/avro-1.7.4-p5-gcc-10.4.0-ec2-package-centos-8-aarch64.tar.gz']'
 returned non-zero exit status 8{noformat}
It looks like the toolchain build failed to publish to the 
native-toolchain-us-west-2 (but succeeded going to native-toolchain). I can 
copy it over to fix this, but we should also merge this: 
https://gerrit.cloudera.org/#/c/21134/

> CentOS 8 builds fail with python ImportError: No module named 
> RuntimeProfile.ttypes
> ---
>
> Key: IMPALA-12953
> URL: https://issues.apache.org/jira/browse/IMPALA-12953
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Zoltán Borók-Nagy
>Assignee: Laszlo Gaal
>Priority: Major
>  Labels: broken-build
>
> Saw the followings in CentOS 8 builds:
> {noformat}
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/bin/start-impala-cluster.py",
>  line 39, in 
> from tests.common.impala_cluster import (ImpalaCluster, 
> DEFAULT_BEESWAX_PORT,
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_cluster.py",
>  line 37, in 
> from tests.common.impala_service import (
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_service.py",
>  line 33, in 
> from tests.common.impala_connection import create_connection, 
> create_ldap_connection
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-release-arm/repos/Impala/tests/common/impala_connection.py",
>  line 31, in 
> from RuntimeProfile.ttypes import TRuntimeProfileFormat
> ImportError: No module named RuntimeProfile.ttypes{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12807) Add option to use the 'mold' linker

2024-03-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12807.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> Add option to use the 'mold' linker
> ---
>
> Key: IMPALA-12807
> URL: https://issues.apache.org/jira/browse/IMPALA-12807
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> Mold is a new linker that says it is faster than lld and gold. See 
> [https://github.com/rui314/mold]
> Hand testing it on my machine, it makes a big difference in speed of 
> iteration:
> {noformat}
> # Test case:
> #  - Start with fully built impalad
> #  - touch be/src/scheduling/scheduler.cc 
> #  - time make -j12 impalad
> With Gold (current default):
> real    0m15.843s
> user    0m15.478s
> sys     0m2.127s
> real    0m15.820s
> user    0m15.302s
> sys     0m2.157s
> real    0m16.136s
> user    0m15.799s
> sys     0m2.098s
> With Mold:
> real    0m2.479s
> user    0m2.169s
> sys     0m0.958s
> real    0m2.674s
> user    0m2.218s
> sys     0m1.086s
> real    0m2.524s
> user    0m2.136s
> sys     0m1.042s{noformat}
> This seems like something we should investigate further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12807) Add option to use the 'mold' linker

2024-03-25 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12807:
--

Assignee: Joe McDonnell

> Add option to use the 'mold' linker
> ---
>
> Key: IMPALA-12807
> URL: https://issues.apache.org/jira/browse/IMPALA-12807
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Mold is a new linker that says it is faster than lld and gold. See 
> [https://github.com/rui314/mold]
> Hand testing it on my machine, it makes a big difference in speed of 
> iteration:
> {noformat}
> # Test case:
> #  - Start with fully built impalad
> #  - touch be/src/scheduling/scheduler.cc 
> #  - time make -j12 impalad
> With Gold (current default):
> real    0m15.843s
> user    0m15.478s
> sys     0m2.127s
> real    0m15.820s
> user    0m15.302s
> sys     0m2.157s
> real    0m16.136s
> user    0m15.799s
> sys     0m2.098s
> With Mold:
> real    0m2.479s
> user    0m2.169s
> sys     0m0.958s
> real    0m2.674s
> user    0m2.218s
> sys     0m1.086s
> real    0m2.524s
> user    0m2.136s
> sys     0m1.042s{noformat}
> This seems like something we should investigate further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12818) Implement the initial framework for caching tuples

2024-03-19 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12818.

Fix Version/s: Impala 4.4.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> Implement the initial framework for caching tuples
> --
>
> Key: IMPALA-12818
> URL: https://issues.apache.org/jira/browse/IMPALA-12818
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> This tracks introducing the basic concepts of tuple caching. Specifically, it 
> should introduce the TupleCacheNode, which is an ExecNode that can be placed 
> in a plan to cache tuples between existing nodes in the plan. The 
> TupleCachePlanner should place these nodes in locations that are eligible. 
> The TupleCachePlanner is responsible for eligibility as well as producing a 
> cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12883) Add support for changing the charge for a cache entry

2024-03-19 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12883.

Fix Version/s: Impala 4.4.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> Add support for changing the charge for a cache entry
> -
>
> Key: IMPALA-12883
> URL: https://issues.apache.org/jira/browse/IMPALA-12883
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> The Cache implementation in be/src/util/cache currently does not support 
> modifying the charge of a cache entry after it has been created. For cases 
> where the size is known up front, this is fine. For example, the data cache 
> knows the number of bytes it will consume before it creates the cache entry.
> This is a problem for caches that may not know the size of an entry up front. 
> For example, the tuple cache may want to create a cache entry immediately to 
> avoid concurrency issues, but then it would want to update that entry's 
> charge as the entry is finalized (or reaches certain size increments).
> It would also be useful to expose the maximum charge allowed for a cache 
> entry. This would allow writers to avoid creating a cache entry that is too 
> large.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12908) Add a correctness verification mode for tuple caching

2024-03-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12908:
--

 Summary: Add a correctness verification mode for tuple caching
 Key: IMPALA-12908
 URL: https://issues.apache.org/jira/browse/IMPALA-12908
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


To get more coverage of tuple caching correctness, it would be useful to have 
automated correctness checking for tuple caching. In this mode, the tuple cache 
node would fetch results from its child, persist the new results to disk, then 
compare the new results to the cache contents at the end. The goal is to be 
able to run a variety of queries, including various end-to-end tests and verify 
that there is no variability in the results stored to the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12907) Add test for TPC-H / TPC-DS queries with tuple caching

2024-03-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12907:
--

 Summary: Add test for TPC-H / TPC-DS queries with tuple caching
 Key: IMPALA-12907
 URL: https://issues.apache.org/jira/browse/IMPALA-12907
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


As a sanity check for tuple caching, we should run the TPC-H/TPC-DS queries 
with tuple caching enabled more than once and check for correct results.

This is also a good time to introduce a way to run the Impala cluster with 
tuple caching enabled via an environment variable. The goal is to be able to 
run all the tuple caching tests as regular end to end tests rather than custom 
cluster tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12906) Incorporate run time scan range information into the tuple cache key

2024-03-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12906:
--

 Summary: Incorporate run time scan range information into the 
tuple cache key
 Key: IMPALA-12906
 URL: https://issues.apache.org/jira/browse/IMPALA-12906
 Project: IMPALA
  Issue Type: Task
  Components: Backend, Frontend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The cache key for tuple caching currently doesn't incorporate information about 
the scan ranges for the tables that it scans. This is important for detecting 
changes in the table and having different cache keys for different fragment 
instances that are assigned different scan ranges.

To make this deterministic for mt_dop, we need mt_dop to assign scan ranges 
deterministically to individual fragment instances rather than using the shared 
queue introduced in IMPALA-9655.

One way to implement this is to collect information about the scan nodes that 
feed into the tuple cache and pass that information over to the tuple cache 
node. At runtime, it can hash the scan ranges assigned to those scan nodes and 
incorporate that into the cache key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12905) Implement disk-based tuple caching

2024-03-14 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12905:
--

 Summary: Implement disk-based tuple caching
 Key: IMPALA-12905
 URL: https://issues.apache.org/jira/browse/IMPALA-12905
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The TupleCacheNode caches tuples to be reused later for equivalent queries. 
This tracks implementing a version that serializes tuples and stores them as 
files on local disk. 

This will have a few parts:
 # There is a TupleCacheMgr that keeps track of what entries exist in the cache 
and evicts entries as needed to make space for new entries. This will be 
configured using startup flags to specify the directory, size, and cache 
eviction policy.
 # The TupleCacheNode will interact with the TupleCacheMgr to determine if the 
entry is available. If it is, it reads the associated tuple cache file and 
returns the RowBatches. If the entry does not exist, it reads RowBatches from 
its child and stores them to a new file in the cache.
 # The TupleReader / TupleWriter implement serialization / deserialization of 
RowBatches to/from a local file. This uses the existing serialization used for 
KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-03-13 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12900:
--

 Summary: Compile binutils with -O3 in the toolchain
 Key: IMPALA-12900
 URL: https://issues.apache.org/jira/browse/IMPALA-12900
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell


Since the toolchain builds binutils with the native compiler (as the toolchain 
compiler hasn't been built yet), we haven't set CFLAGS yet. The default CFLAGS 
for binutils use -O2. It's possible that we could get a bit more speed by 
building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   10   >