[jira] [Commented] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package

2024-04-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835589#comment-17835589
 ] 

Quanlong Huang commented on IMPALA-12979:
-

Maybe some changes in JNI codes make the difference. I don't see this issue in 
the master branch.

> Wildcard in CLASSPATH might not work in the RPM package
> ---
>
> Key: IMPALA-12979
> URL: https://issues.apache.org/jira/browse/IMPALA-12979
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS 
> 7.9 and found launching catalogd failed by the following error (in 
> catalogd.INFO):
> {noformat}
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build 
> 1.8.0_141-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [catalogd+0x27af14c]  
> llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, 
> llvm::ScalarEvolution&) const+0x73c
> #
> # Core dump written. Default location: /opt/impala/core or core.156082
> #
> # An error report file with more information is saved as:
> # /tmp/hs_err_pid156082.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> # {noformat}
> There are other logs in catalogd.ERROR
> {noformat}
> Log file created at: 2024/04/08 04:49:28
> Running on machine: ccycloud-1.quanlong.root.comops.site
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this 
> file.
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp
> could not find method getRootCauseMessage from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> could not find method getStackTrace from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> FileSystem: loadFileSystems failed error:
> (unable to get root cause for java.lang.NoClassDefFoundError)
> (unable to get stack trace for java.lang.NoClassDefFoundError){noformat}
> Resolving the minidump shows me the following stacktrace:
> {noformat}
> (gdb) bt
> #0  0x02baf14c in ?? ()
> #1  0x02baee24 in getJNIEnv ()
> #2  0x02bacb71 in hdfsBuilderConnect ()
> #3  0x012e6ae2 in impala::JniUtil::InitLibhdfs() ()
> #4  0x012e7897 in impala::JniUtil::Init() ()
> #5  0x00be9297 in impala::InitCommonRuntime(int, char**, bool, 
> impala::TestInfo::Mode) ()
> #6  0x00bb604a in CatalogdMain(int, char**) ()
> #7  0x00b33f97 in main (){noformat}
> It indicates something wrong in initializing the JVM. Here are the env vars:
> {noformat}
> Environment Variables:
> JAVA_HOME=/usr/java/jdk1.8.0_141
> CLASSPATH=/opt/impala/conf:/opt/impala/jar/*
> PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin
> LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64
> SHELL=/bin/bash{noformat}
> We use wildcard "*" in the classpath which seems to be the cause. The issue 
> was resolved after using explicit paths in the classpath. Here are what I 
> changed in bin/impala-env.sh:
> {code:bash}
> #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*"
> CLASSPATH=/opt/impala/conf
> for jar in /opt/impala/jar/*.jar; do
>   CLASSPATH="$CLASSPATH:$jar"
> done
> export CLASSPATH
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12905) Implement disk-based tuple caching

2024-04-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835581#comment-17835581
 ] 

ASF subversion and git services commented on IMPALA-12905:
--

Commit 6121c4f7d61fb9f2341cf14e1be3404325fb35b9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6121c4f7d ]

IMPALA-12905: Disk-based tuple caching

This implements on-disk caching for the tuple cache. The
TupleCacheNode uses the TupleFileWriter and TupleFileReader
to write and read back tuples from local files. The file format
uses RowBatch's standard serialization used for KRPC data streams.

The TupleCacheMgr is the daemon-level structure that coordinates
the state machine for cache entries, including eviction. When a
writer is adding an entry, it inserts an IN_PROGRESS entry before
starting to write data. This does not count towards cache capacity,
because the total size is not known yet. This IN_PROGRESS entry
prevents other writers from concurrently writing the same entry.
If the write is successful, the entry transitions to the COMPLETE
state and updates the total size of the entry. If the write is
unsuccessful and a new execution might succeed, then the entry is
removed. If the write is unsuccessful and won't succeed later
(e.g. if the total size of the entry exceeds the max size of an
entry), then it transitions to the TOMBSTONE state. TOMBSTONE
entries avoid the overhead of trying to write entries that are
too large.

Given these states, when a TupleCacheNode is doing its initial
Lookup() call, one of three things can happen:
 1. It can find a COMPLETE entry and read it.
 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it
cannot read or write the entry.
 3. It finds no entry and inserts its own IN_PROGRESS entry
to start a write.

The tuple cache is configured using the tuple_cache parameter,
which is a combination of the cache directory and the capacity
similar to the data_cache parameter. For example, /data/0:100GB
uses directory /data/0 for the cache with a total capacity of
100GB. This currently supports a single directory, but it can
be expanded to multiple directories later if needed. The cache
eviction policy can be specified via the tuple_cache_eviction_policy
parameter, which currently supports LRU or LIRS. The tuple_cache
parameter cannot be specified if allow_tuple_caching=false.

This contains contributions from Michael Smith, Yida Wu,
and Joe McDonnell.

Testing:
 - This adds basic custom cluster tests for the tuple cache.

Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Reviewed-on: http://gerrit.cloudera.org:8080/21171
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 
Reviewed-by: Kurt Deschler 


> Implement disk-based tuple caching
> --
>
> Key: IMPALA-12905
> URL: https://issues.apache.org/jira/browse/IMPALA-12905
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> The TupleCacheNode caches tuples to be reused later for equivalent queries. 
> This tracks implementing a version that serializes tuples and stores them as 
> files on local disk. 
> This will have a few parts:
>  # There is a TupleCacheMgr that keeps track of what entries exist in the 
> cache and evicts entries as needed to make space for new entries. This will 
> be configured using startup flags to specify the directory, size, and cache 
> eviction policy.
>  # The TupleCacheNode will interact with the TupleCacheMgr to determine if 
> the entry is available. If it is, it reads the associated tuple cache file 
> and returns the RowBatches. If the entry does not exist, it reads RowBatches 
> from its child and stores them to a new file in the cache.
>  # The TupleReader / TupleWriter implement serialization / deserialization of 
> RowBatches to/from a local file. This uses the existing serialization used 
> for KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12989) LICENSE and NOTICE files are missing in DEB/RPM packages

2024-04-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12989:

Description: 
In order to release the binaries, we need the LICENSE and NOTICE files added in 
the DEB/RPM packages.
{quote}*COMPILED PACKAGES*

The Apache Software Foundation produces open source software. All releases are 
in the form of the source materials needed to make changes to the software 
being released.

As a convenience to users that might not have the appropriate tools to build a 
compiled version of the source, binary/bytecode packages MAY be distributed 
alongside official Apache releases. In all such cases, the binary/bytecode 
package MUST have the same version number as the source release and MUST only 
add binary/bytecode files that are the result of compiling that version of the 
source code release and its dependencies.

*Licensing Documentation*

Each package MUST provide a {{LICENSE}} file and a {{NOTICE}} file which 
account for the package's exact content. {{LICENSE}} and {{NOTICE}} MUST NOT 
provide unnecessary information about materials which are not bundled in the 
package, such as separately downloaded dependencies.

For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the root of 
the distribution. For additional packages, they MUST be located in the 
distribution format's customary location for licensing materials, such as the 
{{META-INF}} directory of Java "jar" files.
{quote}
[https://www.apache.org/legal/release-policy.html#licensing-documentation]

  was:
In order to release the binaries, we need the LICENSE and NOTICE files added in 
the DEB/RPM packages.
{quote}For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the 
root of the distribution. For additional packages, they MUST be located in the 
distribution format's customary location for licensing materials, such as the 
{{META-INF}} directory of Java "jar" files.
{quote}
[https://www.apache.org/legal/release-policy.html#licensing-documentation]


> LICENSE and NOTICE files are missing in DEB/RPM packages
> 
>
> Key: IMPALA-12989
> URL: https://issues.apache.org/jira/browse/IMPALA-12989
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> In order to release the binaries, we need the LICENSE and NOTICE files added 
> in the DEB/RPM packages.
> {quote}*COMPILED PACKAGES*
> The Apache Software Foundation produces open source software. All releases 
> are in the form of the source materials needed to make changes to the 
> software being released.
> As a convenience to users that might not have the appropriate tools to build 
> a compiled version of the source, binary/bytecode packages MAY be distributed 
> alongside official Apache releases. In all such cases, the binary/bytecode 
> package MUST have the same version number as the source release and MUST only 
> add binary/bytecode files that are the result of compiling that version of 
> the source code release and its dependencies.
> *Licensing Documentation*
> Each package MUST provide a {{LICENSE}} file and a {{NOTICE}} file which 
> account for the package's exact content. {{LICENSE}} and {{NOTICE}} MUST NOT 
> provide unnecessary information about materials which are not bundled in the 
> package, such as separately downloaded dependencies.
> For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the root 
> of the distribution. For additional packages, they MUST be located in the 
> distribution format's customary location for licensing materials, such as the 
> {{META-INF}} directory of Java "jar" files.
> {quote}
> [https://www.apache.org/legal/release-policy.html#licensing-documentation]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12989) LICENSE and NOTICE files are missing in DEB/RPM packages

2024-04-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12989:

Description: 
In order to release the binaries, we need the LICENSE and NOTICE files added in 
the DEB/RPM packages.
{quote}For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the 
root of the distribution. For additional packages, they MUST be located in the 
distribution format's customary location for licensing materials, such as the 
{{META-INF}} directory of Java "jar" files.
{quote}
[https://www.apache.org/legal/release-policy.html#licensing-documentation]

  was:
In order to release the binaries, we need the LICENSE and NOTICE files added in 
the DEB/RPM packages.
{quote}Every Apache distribution must include a NOTICE file in the top 
directory, along with the standard LICENSE file.
{quote}
https://www.apache.org/legal/src-headers.html#notice


> LICENSE and NOTICE files are missing in DEB/RPM packages
> 
>
> Key: IMPALA-12989
> URL: https://issues.apache.org/jira/browse/IMPALA-12989
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> In order to release the binaries, we need the LICENSE and NOTICE files added 
> in the DEB/RPM packages.
> {quote}For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the 
> root of the distribution. For additional packages, they MUST be located in 
> the distribution format's customary location for licensing materials, such as 
> the {{META-INF}} directory of Java "jar" files.
> {quote}
> [https://www.apache.org/legal/release-policy.html#licensing-documentation]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12989) LICENSE and NOTICE files are missing in DEB/RPM packages

2024-04-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12989:
---

 Summary: LICENSE and NOTICE files are missing in DEB/RPM packages
 Key: IMPALA-12989
 URL: https://issues.apache.org/jira/browse/IMPALA-12989
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Quanlong Huang
Assignee: Quanlong Huang


In order to release the binaries, we need the LICENSE and NOTICE files added in 
the DEB/RPM packages.
{quote}Every Apache distribution must include a NOTICE file in the top 
directory, along with the standard LICENSE file.
{quote}
https://www.apache.org/legal/src-headers.html#notice



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

2024-04-09 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12963 started by Michael Smith.
--
> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
> ---
>
> Key: IMPALA-12963
> URL: https://issues.apache.org/jira/browse/IMPALA-12963
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Major
>
> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with 
> following messages:
> *Error Message*
> {code:java}
> test setup failure
> {code}
> *Stacktrace*
> {code:java}
> common/custom_cluster_test_suite.py:226: in teardown_method
> impalad.wait_for_exit()
> common/impala_cluster.py:471: in wait_for_exit
> while self.__get_pid() is not None:
> common/impala_cluster.py:414: in __get_pid
> assert len(pids) < 2, "Expected single pid but found %s" % ", 
> ".join(map(str, pids))
> E   AssertionError: Expected single pid but found 892, 31942
> {code}
> *Standard Error*
> {code:java}
> -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=1 --cluster_id=test_max_select 
> --shutdown_grace_period_s=10 --shutdown_deadline_s=60 
> --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' 
> '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 04:21:44 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:21:44 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2
> 04:21:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:50 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:50 MainThread: num_known_live_backends has reached value: 3
> 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25001
> 04:21:51 MainThread: num_known_live_backends has reached value: 3
> 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25002
> 04:21:51 MainThread: num_known_live_backends has reached value: 3
> 04:21:52 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2024-03-28 04:21:52,490 DEB

[jira] [Assigned] (IMPALA-12988) Calculate an unbounded version of CpuAsk

2024-04-09 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto reassigned IMPALA-12988:
-

Assignee: Riza Suminto

> Calculate an unbounded version of CpuAsk
> 
>
> Key: IMPALA-12988
> URL: https://issues.apache.org/jira/browse/IMPALA-12988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> CpuAsk is calculated through recursive call beginning at 
> Planner.computeBlockingAwareCores(), which called after 
> Planner.computeEffectiveParallelism(). It does blocking operator analysis 
> over selected degree of parallelism that decided during 
> computeEffectiveParallelism() traversal. That selected degree of parallelism, 
> however, is already bounded by min and max parallelism config, derived from 
> PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options 
> accordingly.
> It is beneficial to have another version of CpuAsk that is not bounded by min 
> and max parallelism config. It should purely based on the fragment's 
> ProcessingCost and query plan relationship constraint (ie., num JOIN BUILDER 
> fragment should be equal as num JOIN fragment for partitioned join). During 
> executor group set selection, Frontend should use the unbounded CpuAsk number 
> to avoid assigning query to small executor group set prematurely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12988) Calculate an unbounded version of CpuAsk

2024-04-09 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-12988:
-

 Summary: Calculate an unbounded version of CpuAsk
 Key: IMPALA-12988
 URL: https://issues.apache.org/jira/browse/IMPALA-12988
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Riza Suminto


CpuAsk is calculated through recursive call beginning at 
Planner.computeBlockingAwareCores(), which called after 
Planner.computeEffectiveParallelism(). It does blocking operator analysis over 
selected degree of parallelism that decided during 
computeEffectiveParallelism() traversal. That selected degree of parallelism, 
however, is already bounded by min and max parallelism config, derived from 
PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options 
accordingly.

It is beneficial to have another version of CpuAsk that is not bounded by min 
and max parallelism config. It should purely based on the fragment's 
ProcessingCost and query plan relationship constraint (ie., num JOIN BUILDER 
fragment should be equal as num JOIN fragment for partitioned join). During 
executor group set selection, Frontend should use the unbounded CpuAsk number 
to avoid assigning query to small executor group set prematurely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11776) Use hive.metastore.table.owner for create table

2024-04-09 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11776:
---
Issue Type: Improvement  (was: New Feature)

> Use hive.metastore.table.owner for create table
> ---
>
> Key: IMPALA-11776
> URL: https://issues.apache.org/jira/browse/IMPALA-11776
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Zihao Ye
>Priority: Major
>  Labels: iceberg
>
> https://issues.apache.org/jira/browse/IMPALA-11429 made the creation of an 
> Iceberg table happen in 2 steps. The first step creates the table, however, 
> with wrong owner and the second step is an ALTER TABLE to set the correct 
> table owner.
> Since Iceberg 1.1.0 there is now a way to provide a table owner via a table 
> property so we can make the create table operation to take one step again.
> https://github.com/apache/iceberg/pull/5763
> https://github.com/apache/iceberg/pull/6154



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note that Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note that Java handles  \0 characters in unicode in a special way, which may 
> be related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note Java handles  \0 characters in unicode in a special way, which may be 
> related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12987:


 Summary: Errors with \0 character in partition values
 Key: IMPALA-12987
 URL: https://issues.apache.org/jira/browse/IMPALA-12987
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-04-09 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835458#comment-17835458
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping [~mylogi...@gmail.com][~stigahuang][~VenuReddy]:D

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-04-09 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835459#comment-17835459
 ] 

Maxwell Guo commented on IMPALA-12709:
--

[~VenuReddy] any update here ?:D

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package

2024-04-09 Thread XiangYang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835425#comment-17835425
 ] 

XiangYang commented on IMPALA-12979:


It seems no problem, but I'm curious about why it can't be a problem before?

> Wildcard in CLASSPATH might not work in the RPM package
> ---
>
> Key: IMPALA-12979
> URL: https://issues.apache.org/jira/browse/IMPALA-12979
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS 
> 7.9 and found launching catalogd failed by the following error (in 
> catalogd.INFO):
> {noformat}
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build 
> 1.8.0_141-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [catalogd+0x27af14c]  
> llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, 
> llvm::ScalarEvolution&) const+0x73c
> #
> # Core dump written. Default location: /opt/impala/core or core.156082
> #
> # An error report file with more information is saved as:
> # /tmp/hs_err_pid156082.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> # {noformat}
> There are other logs in catalogd.ERROR
> {noformat}
> Log file created at: 2024/04/08 04:49:28
> Running on machine: ccycloud-1.quanlong.root.comops.site
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this 
> file.
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp
> could not find method getRootCauseMessage from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> could not find method getStackTrace from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> FileSystem: loadFileSystems failed error:
> (unable to get root cause for java.lang.NoClassDefFoundError)
> (unable to get stack trace for java.lang.NoClassDefFoundError){noformat}
> Resolving the minidump shows me the following stacktrace:
> {noformat}
> (gdb) bt
> #0  0x02baf14c in ?? ()
> #1  0x02baee24 in getJNIEnv ()
> #2  0x02bacb71 in hdfsBuilderConnect ()
> #3  0x012e6ae2 in impala::JniUtil::InitLibhdfs() ()
> #4  0x012e7897 in impala::JniUtil::Init() ()
> #5  0x00be9297 in impala::InitCommonRuntime(int, char**, bool, 
> impala::TestInfo::Mode) ()
> #6  0x00bb604a in CatalogdMain(int, char**) ()
> #7  0x00b33f97 in main (){noformat}
> It indicates something wrong in initializing the JVM. Here are the env vars:
> {noformat}
> Environment Variables:
> JAVA_HOME=/usr/java/jdk1.8.0_141
> CLASSPATH=/opt/impala/conf:/opt/impala/jar/*
> PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin
> LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64
> SHELL=/bin/bash{noformat}
> We use wildcard "*" in the classpath which seems to be the cause. The issue 
> was resolved after using explicit paths in the classpath. Here are what I 
> changed in bin/impala-env.sh:
> {code:bash}
> #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*"
> CLASSPATH=/opt/impala/conf
> for jar in /opt/impala/jar/*.jar; do
>   CLASSPATH="$CLASSPATH:$jar"
> done
> export CLASSPATH
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12986) Base64Encode fails if the 'out_len' output parameter is passed with certain values

2024-04-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12986 started by Daniel Becker.
--
> Base64Encode fails if the 'out_len' output parameter is passed with certain 
> values
> --
>
> Key: IMPALA-12986
> URL: https://issues.apache.org/jira/browse/IMPALA-12986
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The Base64Encode function in coding-util.h with signature 
> {code:java}
> bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, 
> int64_t* out_len);{code}
> fails if '*out_len', when passed to the function, contains a negative value 
> or a value that does not fit in a 32 bit integer.
> Internally we use the 
>  
> {code:java}
> int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned outmax, 
> unsigned *outlen);{code}
>  
> function and explicitly cast 'out_len' to 'unsigned*'.
> The success of this function shouldn't depend on the value of '*out_len' 
> because it is an output parameter, so we should set '*out_len' to zero before 
> passing it to {{{}sasl_encode64(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12986) Base64Encode fails if the 'out_len' output parameter is passed with certain values

2024-04-09 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-12986:
--

 Summary: Base64Encode fails if the 'out_len' output parameter is 
passed with certain values
 Key: IMPALA-12986
 URL: https://issues.apache.org/jira/browse/IMPALA-12986
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


The Base64Encode function in coding-util.h with signature 
{code:java}
bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, 
int64_t* out_len);{code}
fails if '*out_len', when passed to the function, contains a negative value or 
a value that does not fit in a 32 bit integer.

Internally we use the 
 
{code:java}
int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned outmax, 
unsigned *outlen);{code}
 
function and explicitly cast 'out_len' to 'unsigned*'.

The success of this function shouldn't depend on the value of '*out_len' 
because it is an output parameter, so we should set '*out_len' to zero before 
passing it to {{{}sasl_encode64(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12810) Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12810:
---
Description: Now that we have the DIRECTED distribution mode, some parts of 
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to 
simplify the above classes.  (was: Now that we have the BROADCAST distribution 
mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead 
code. It is time to simplify the above classes.)

> Simplify IcebergDeleteNode and IcebergDeleteBuilder
> ---
>
> Key: IMPALA-12810
> URL: https://issues.apache.org/jira/browse/IMPALA-12810
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Now that we have the DIRECTED distribution mode, some parts of 
> IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to 
> simplify the above classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12048) Add query progress of a query to Impala WebUI

2024-04-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835347#comment-17835347
 ] 

ASF subversion and git services commented on IMPALA-12048:
--

Commit 5c003cdcda604c43dd836571db02afcfbdc05dbc in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5c003cdcd ]

IMPALA-12978: Fix impala-shell`s live progress with older Impalas

If the Impala server has an older version that does not contain
IMPALA-12048 then TExecProgress.total_fragment_instances will be
None, leading to error when checking total_fragment_instances > 0.

Note that this issue only comes with Python 3, in Python 2 None > 0
returns False.

Testing:
- Manually checked with a modified Impala that doesn't set
  total_fragment_instances. Only the scanner progress bar is shown
  in this case.

Change-Id: Ic6562ff6c908bfebd09b7612bc5bcbd92623a8e6
Reviewed-on: http://gerrit.cloudera.org:8080/21256
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 
Reviewed-by: Zihao Ye 


> Add query progress of a query to Impala WebUI
> -
>
> Key: IMPALA-12048
> URL: https://issues.apache.org/jira/browse/IMPALA-12048
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: YifanZhang
>Assignee: YifanZhang
>Priority: Major
>
> Now we can get the scan progress of a query from the /queries webpage. For 
> CPU-intensive queries, the scan progress may reach 100% soon,  but there is 
> still some time before a query is completed. In this case, we may need 
> another progress bar to show the progress of query execution. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12978) IMPALA-12544 made impala-shell incompatible with old impala servers

2024-04-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835346#comment-17835346
 ] 

ASF subversion and git services commented on IMPALA-12978:
--

Commit 5c003cdcda604c43dd836571db02afcfbdc05dbc in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5c003cdcd ]

IMPALA-12978: Fix impala-shell`s live progress with older Impalas

If the Impala server has an older version that does not contain
IMPALA-12048 then TExecProgress.total_fragment_instances will be
None, leading to error when checking total_fragment_instances > 0.

Note that this issue only comes with Python 3, in Python 2 None > 0
returns False.

Testing:
- Manually checked with a modified Impala that doesn't set
  total_fragment_instances. Only the scanner progress bar is shown
  in this case.

Change-Id: Ic6562ff6c908bfebd09b7612bc5bcbd92623a8e6
Reviewed-on: http://gerrit.cloudera.org:8080/21256
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 
Reviewed-by: Zihao Ye 


> IMPALA-12544 made impala-shell incompatible with old impala servers
> ---
>
> Key: IMPALA-12978
> URL: https://issues.apache.org/jira/browse/IMPALA-12978
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> IMPALA-12544 uses  "progress.total_fragment_instances > 0:", but 
> total_fragment_instances is None if the server is older and does not know 
> this Thrift member yet (added in IMPALA-12048). 
> [https://github.com/apache/impala/blob/fb3c379f395635f9f6927b40694bc3dd95a2866f/shell/impala_shell.py#L1320]
>  
> This leads to error messages in interactive shell sessions when progress 
> reporting is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12651) Add support to BINARY type Iceberg Metadata table columns

2024-04-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12651 started by Daniel Becker.
--
> Add support to BINARY type Iceberg Metadata table columns
> -
>
> Key: IMPALA-12651
> URL: https://issues.apache.org/jira/browse/IMPALA-12651
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Tamas Mate
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> Impala should be able to read BINARY type columns from Iceberg metadata 
> tables as strings, additionally this should be allowed when reading these 
> types from complex types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12651) Add support to BINARY type Iceberg Metadata table columns

2024-04-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker reassigned IMPALA-12651:
--

Assignee: Daniel Becker

> Add support to BINARY type Iceberg Metadata table columns
> -
>
> Key: IMPALA-12651
> URL: https://issues.apache.org/jira/browse/IMPALA-12651
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Tamas Mate
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> Impala should be able to read BINARY type columns from Iceberg metadata 
> tables as strings, additionally this should be allowed when reading these 
> types from complex types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8809) Refresh a subset of partitions for ACID tables

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab reassigned IMPALA-8809:


Assignee: (was: Gabor Kaszab)

> Refresh a subset of partitions for ACID tables
> --
>
> Key: IMPALA-8809
> URL: https://issues.apache.org/jira/browse/IMPALA-8809
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Gabor Kaszab
>Priority: Critical
>  Labels: impala-acid
>
> Enhancing REFRESH logic to handle ACID tables was covered by this change: 
> https://issues.apache.org/jira/browse/IMPALA-8600
> Basically each user initiated REFRESH PARTITION is rejected meanwhile the 
> REFRESH_PARTITION event in event processor are actually doing a full table 
> load for ACID tables.
> There is room for improvement: When a full table refresh is being executed on 
> an ACID table we can have 2 scenarios:
> - If there was some schema changes then reload the full table. Identify such 
> a scenario should be possible by checking the table-level writeId. However, 
> there is a bug in Hive that it doesn't update that field for partitioned 
> tables (https://issues.apache.org/jira/browse/HIVE-22062). This would be the 
> desired way but could also be workarounded by checking other fields lik 
> lastDdlChanged or such.
> - If a full table refresh is not needed then we should fetch the 
> partition-level writeIds and reload only the ones that are out-of-date 
> locally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work stopped] (IMPALA-8809) Refresh a subset of partitions for ACID tables

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8809 stopped by Gabor Kaszab.

> Refresh a subset of partitions for ACID tables
> --
>
> Key: IMPALA-8809
> URL: https://issues.apache.org/jira/browse/IMPALA-8809
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: impala-acid
>
> Enhancing REFRESH logic to handle ACID tables was covered by this change: 
> https://issues.apache.org/jira/browse/IMPALA-8600
> Basically each user initiated REFRESH PARTITION is rejected meanwhile the 
> REFRESH_PARTITION event in event processor are actually doing a full table 
> load for ACID tables.
> There is room for improvement: When a full table refresh is being executed on 
> an ACID table we can have 2 scenarios:
> - If there was some schema changes then reload the full table. Identify such 
> a scenario should be possible by checking the table-level writeId. However, 
> there is a bug in Hive that it doesn't update that field for partitioned 
> tables (https://issues.apache.org/jira/browse/HIVE-22062). This would be the 
> desired way but could also be workarounded by checking other fields lik 
> lastDdlChanged or such.
> - If a full table refresh is not needed then we should fetch the 
> partition-level writeIds and reload only the ones that are out-of-date 
> locally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12729) Allow creating primary keys for Iceberg tables

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-12729.
---
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Allow creating primary keys for Iceberg tables
> --
>
> Key: IMPALA-12729
> URL: https://issues.apache.org/jira/browse/IMPALA-12729
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.4.0
>
>
> Some writer engines require primary keys on a table so that they can use them 
> for writing equality deletes (only the PK cols are written to the eq-delete 
> files).
> Impala currently doesn't reject setting PKs for Iceberg tables, however it 
> seems to omit them. This suceeds:
> {code:java}
> create table ice_pk (i int, j int, primary key(i)) stored as iceberg;
> {code}
> However, DESCRIBE EXTENDED doesn't show 'identifier-field-ids' in the 
> 'current-schema'.
> On the other hand for a table created by Flink these fields are there:
> {code:java}
> current-schema                                     | 
> {\"type\":\"struct\",\"schema-id\":0,\"identifier-field-ids\":[1],\"fields\":[{\"id\":1,\"name\":\"i\",\"required\":true,\"type\":\"int\"},{\"id\":2,\"name\":\"s\",\"required\":false,\"type\":\"string\"}]}
>  {code}
> Part2:
> SHOW CREATE TABLE should also correctly print the primary key part of the 
> field list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11387) Add virtual column ICEBERG__SEQUENCE__NUMBER

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-11387.
---
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> Add virtual column ICEBERG__SEQUENCE__NUMBER
> 
>
> Key: IMPALA-11387
> URL: https://issues.apache.org/jira/browse/IMPALA-11387
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.3.0
>
>
> A virtual column ICEBERG__SEQUENCE__NUMBER is needed to handle row-level 
> updates.
> See details at:
>  https://iceberg.apache.org/spec/#scan-planning
> This could be written in the template tuple, similarly to INPUT__FILE__NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12694) Test equality delete support with data from NiFi

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-12694.
---
Fix Version/s: Not Applicable
   Resolution: Fixed

> Test equality delete support with data from NiFi
> 
>
> Key: IMPALA-12694
> URL: https://issues.apache.org/jira/browse/IMPALA-12694
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Not Applicable
>
>
> Iceberg equality delete support in Impala is a subset of what the Iceberg 
> spec allows for equality deletes. Currently, we have sufficient 
> implementation to use eq-deletes created by Flink. As a next step, let's 
> examine if this implementation is sufficient for eq-deletes created by NiFi.
> In theory, NiFi uses Flink's eq-delete implementation so Impala should be 
> fine reading such data. However, at least some manual tests needed for 
> verification, and if it turns out that there are some uncovered edge cases, 
> we should fill these holes in the implementation (probably in separate jiras).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12600) Support equality deletes when table has partition or schema evolution

2024-04-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-12600.
---
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Support equality deletes when table has partition or schema evolution
> -
>
> Key: IMPALA-12600
> URL: https://issues.apache.org/jira/browse/IMPALA-12600
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> With adding the basic equality delete read support, we reject queries for 
> Iceberg tables that has equality delete files and has partition or schema 
> evolution. This ticket is to enhance this functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12984) Show inactivity of data exchanges in query profile

2024-04-09 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated IMPALA-12984:
---
Attachment: Profile with slow data exchanges.txt

> Show inactivity of data exchanges in query profile
> --
>
> Key: IMPALA-12984
> URL: https://issues.apache.org/jira/browse/IMPALA-12984
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Riza Suminto
>Priority: Major
> Attachments: Profile with slow data exchanges.txt
>
>
> Many-to-many data exchanges can be bottlenecked by hotspot receiver such 
> scenario described in IMPALA-6692 or when data spilling happens in subset of 
> backend. Ideally, this occurrences should be easily figured out in query 
> profile. But triaging this kind of issue often requires correlation analysis 
> of several counters in query profile. There are few ideas on how to improve 
> this identification:
>  # Upon query completion, let coordinator do some profile analysis and print 
> WARNING in query profile pointing at the skew. One group of EXCHANGE senders 
> and receivers can only complete simultaneously since all receivers need to 
> wait for EOS signal from all senders. Let say we take max of 
> TotalNetworkSendTime from all senders and max of DataWaitTime from all 
> receivers, a "mutual wait" time of min(TotalNetworkSendTime,DataWaitTime) can 
> be used as indicator of how long the exchanges are waiting for query 
> operators above them to progress.
>  # Add "Max Inactive" column in ExecSummary table. Existing "Avg Time" and 
> "Max Time" are derived from RuntimeProfileBase::local_time_ns_. If 
> ExecSummary also display maximum value of RuntimeProfileBase::inactive_timer_ 
> of each query operator as "Max Inactive", we can then compare it against "Max 
> Time" and figure out which exchange is mostly idle waiting. The calculation 
> between local_time_ns, children_total_time, and inactive_timer can be seen 
> here at 
> [https://github.com/apache/impala/blob/0721858/be/src/util/runtime-profile.cc#L935-L938]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9703) Skip loading partition meta and file meta for PB scale tables

2024-04-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835293#comment-17835293
 ] 

Quanlong Huang commented on IMPALA-9703:


Thank [~tangzhi]! Assigning this to you.

This requires a large change. Let's start the design before coding.

> Skip loading partition meta and file meta for PB scale tables
> -
>
> Key: IMPALA-9703
> URL: https://issues.apache.org/jira/browse/IMPALA-9703
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Zhi Tang
>Priority: Critical
>  Labels: catalog-2024
>
> PB scale tables that have >100K partitions may hit catalog limitations. 
> Caching all the partitions is also a waste since usually only few of them are 
> required. Queries scanning all partitions probably fail with resource 
> limitation errors so it's not in our consideration.
> This JIRA tracks the work to skip caching partition meta of a table. Catalogd 
> will only cache the HmsTable object and partition list (partition names, e.g. 
> "p1=a/p2=b" and internal partition ids generated by Impala). Coordinators 
> fetch the partition meta on-demand when compiling queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9703) Skip loading partition meta and file meta for PB scale tables

2024-04-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-9703:
--

Assignee: Zhi Tang

> Skip loading partition meta and file meta for PB scale tables
> -
>
> Key: IMPALA-9703
> URL: https://issues.apache.org/jira/browse/IMPALA-9703
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Zhi Tang
>Priority: Critical
>  Labels: catalog-2024
>
> PB scale tables that have >100K partitions may hit catalog limitations. 
> Caching all the partitions is also a waste since usually only few of them are 
> required. Queries scanning all partitions probably fail with resource 
> limitation errors so it's not in our consideration.
> This JIRA tracks the work to skip caching partition meta of a table. Catalogd 
> will only cache the HmsTable object and partition list (partition names, e.g. 
> "p1=a/p2=b" and internal partition ids generated by Impala). Coordinators 
> fetch the partition meta on-demand when compiling queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12152) Add query option to wait for events sync up

2024-04-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12152:

Labels: catalog-2024  (was: )

> Add query option to wait for events sync up
> ---
>
> Key: IMPALA-12152
> URL: https://issues.apache.org/jira/browse/IMPALA-12152
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: catalog-2024
>
> Event-processor is designed to get rid of manual RT/IM (RefreshTable / 
> InvalidateMetadata) commands that sync up with external HMS modifications. 
> However, event processing could be delayed. Queries might still see stale 
> metadata if the event-processor is lagging behind. We should provide a 
> mechanism to let query planning wait until the metadata is synced up. Users 
> can turn it on for sensitive queries that depend on external modifications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12962) Estimated metadata size of a table doesn't match the actual java object size

2024-04-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12962:

Labels: catalog-2024  (was: )

> Estimated metadata size of a table doesn't match the actual java object size
> 
>
> Key: IMPALA-12962
> URL: https://issues.apache.org/jira/browse/IMPALA-12962
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: catalog-2024
>
> Catalogd shows the top-25 largest tables in its WebUI at the "/catalog" 
> endpoint. The estimated metadata size is computed in 
> HdfsTable#getTHdfsTable():
> [https://github.com/apache/impala/blob/0d49c9d6cc7fc0903d60a78d8aaa996af0249c06/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2414-L2451]
> The current formula is
>  * memUsageEstimate = numPartitions * 2KB + numFiles * 500B + numBlocks * 
> 150B + (optional) incrementalStats
>  * (optional) incrementalStats = numPartitions * numColumns * 200B
> It's ok to use this formula to compare tables. But it can't be used to 
> estimate the max heap size of catalogd. E.g. it doesn't consider the column 
> comments and tblproperties which could have long strings. Column names should 
> also be considered in case the table is a wide table.
> We can compare the estimated sizes with results from ehcache-sizeof or jamm 
> and update the formula. Or use these libraries to estimate the sizes directly 
> if they won't impact the performance.
> CC [~MikaelSmith] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org