date:20191210

[jira] [Created] (IMPALA-9237) Make package binutils work on aarch64

2019-12-10 Thread huangtianhua (Jira)

huangtianhua created IMPALA-9237:


 Summary: Make package binutils work on aarch64
 Key: IMPALA-9237
 URL: https://issues.apache.org/jira/browse/IMPALA-9237
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.4.0
Reporter: huangtianhua


Run ./buildall.sh on aarch64 instance, errors raised when configure binutils:

checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking target system type... aarch64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
checking whether ln -s works... yes
checking for a sed that does not truncate output... /bin/sed
checking for gawk... gawk
checking for gcc... /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc
checking for C compiler default output file name...
{color:#de350b}configure: error: in 
`/home/jenkins/workspace/native-toolchain/source/binutils/binutils-2.26.1':{color}
configure: error: C compiler cannot create executables
See `config.log' for more details.

###config.log

Thread model: posix
gcc version 4.9.2 (GCC)
configure:4387: $? = 0
configure:4376: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -V >&5
gcc: error: unrecognized command line option '-V'
gcc: fatal error: no input files
compilation terminated.
configure:4387: $? = 1
configure:4376: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -qversion >&5
gcc: error: unrecognized command line option '-qversion'
gcc: fatal error: no input files
compilation terminated.
configure:4387: $? = 1
configure:4407: checking for C compiler default output file name
configure:4429: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -fPIC -O3 -m64 
-mno-avx2 
-Wl,-rpath,'$ORIGIN/../lib64',-rpath,'$$ORIGIN/../lib64',-rpath,'$ORIGIN/../lib',-rpath,'$$ORIGIN/../lib'
 -L/home/jenkins/workspace/native-toolchain/build/gcc-4.9.2/lib64 conftest.c >&5
{color:#de350b}gcc: error: unrecognized command line option '-m64'{color}
{color:#de350b}gcc: error: unrecognized command line option '-mno-avx2'{color}
configure:4433: $? = 1
configure:4470: result:
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| /* end confdefs.h. */



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9237) Make package binutils work on aarch64

2019-12-10 Thread huangtianhua (Jira)

huangtianhua created IMPALA-9237:


 Summary: Make package binutils work on aarch64
 Key: IMPALA-9237
 URL: https://issues.apache.org/jira/browse/IMPALA-9237
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.4.0
Reporter: huangtianhua


Run ./buildall.sh on aarch64 instance, errors raised when configure binutils:

checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking target system type... aarch64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
checking whether ln -s works... yes
checking for a sed that does not truncate output... /bin/sed
checking for gawk... gawk
checking for gcc... /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc
checking for C compiler default output file name...
{color:#de350b}configure: error: in 
`/home/jenkins/workspace/native-toolchain/source/binutils/binutils-2.26.1':{color}
configure: error: C compiler cannot create executables
See `config.log' for more details.

###config.log

Thread model: posix
gcc version 4.9.2 (GCC)
configure:4387: $? = 0
configure:4376: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -V >&5
gcc: error: unrecognized command line option '-V'
gcc: fatal error: no input files
compilation terminated.
configure:4387: $? = 1
configure:4376: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -qversion >&5
gcc: error: unrecognized command line option '-qversion'
gcc: fatal error: no input files
compilation terminated.
configure:4387: $? = 1
configure:4407: checking for C compiler default output file name
configure:4429: /tmp/tmp.cyR1NfqD1D-impala-toolchain/gcc -fPIC -O3 -m64 
-mno-avx2 
-Wl,-rpath,'$ORIGIN/../lib64',-rpath,'$$ORIGIN/../lib64',-rpath,'$ORIGIN/../lib',-rpath,'$$ORIGIN/../lib'
 -L/home/jenkins/workspace/native-toolchain/build/gcc-4.9.2/lib64 conftest.c >&5
{color:#de350b}gcc: error: unrecognized command line option '-m64'{color}
{color:#de350b}gcc: error: unrecognized command line option '-mno-avx2'{color}
configure:4433: $? = 1
configure:4470: result:
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| /* end confdefs.h. */



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9236) Ported native-toolchain to work on aarch64

2019-12-10 Thread huangtianhua (Jira)

huangtianhua created IMPALA-9236:


 Summary: Ported native-toolchain to work on aarch64
 Key: IMPALA-9236
 URL: https://issues.apache.org/jira/browse/IMPALA-9236
 Project: IMPALA
  Issue Type: Task
Reporter: huangtianhua


Make native-toolchain to work on aarch64 platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9236) Ported native-toolchain to work on aarch64

2019-12-10 Thread huangtianhua (Jira)

huangtianhua created IMPALA-9236:


 Summary: Ported native-toolchain to work on aarch64
 Key: IMPALA-9236
 URL: https://issues.apache.org/jira/browse/IMPALA-9236
 Project: IMPALA
  Issue Type: Task
Reporter: huangtianhua


Make native-toolchain to work on aarch64 platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9235) Backport Kudu socket stats for /rpcz

2019-12-10 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9235:
-

 Summary: Backport Kudu socket stats for /rpcz
 Key: IMPALA-9235
 URL: https://issues.apache.org/jira/browse/IMPALA-9235
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This Kudu commit has some nice socket stats that will help diagnose network 
perf issues:   
https://github.com/apache/kudu/commit/0f6d33b4a29873197952335a5777ccf9163fc307

We should backport this and make sure we expose all of the useful 
per-connection information (kudu exposes more already on its rpcz page because 
it dumps the protobufs to JSON).





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9235) Backport Kudu socket stats for /rpcz

2019-12-10 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9235:
-

 Summary: Backport Kudu socket stats for /rpcz
 Key: IMPALA-9235
 URL: https://issues.apache.org/jira/browse/IMPALA-9235
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This Kudu commit has some nice socket stats that will help diagnose network 
perf issues:   
https://github.com/apache/kudu/commit/0f6d33b4a29873197952335a5777ccf9163fc307

We should backport this and make sure we expose all of the useful 
per-connection information (kudu exposes more already on its rpcz page because 
it dumps the protobufs to JSON).





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9234) Support Ranger row filtering policies

2019-12-10 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-9234:
--

 Summary: Support Ranger row filtering policies
 Key: IMPALA-9234
 URL: https://issues.apache.org/jira/browse/IMPALA-9234
 Project: IMPALA
  Issue Type: New Feature
  Components: Security
Reporter: Quanlong Huang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9234) Support Ranger row filtering policies

2019-12-10 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-9234:
--

 Summary: Support Ranger row filtering policies
 Key: IMPALA-9234
 URL: https://issues.apache.org/jira/browse/IMPALA-9234
 Project: IMPALA
  Issue Type: New Feature
  Components: Security
Reporter: Quanlong Huang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9233) Add impalad level metrics for query retries

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9233:


 Summary: Add impalad level metrics for query retries
 Key: IMPALA-9233
 URL: https://issues.apache.org/jira/browse/IMPALA-9233
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar


It would nice to have some impalad level metrics related to query retries. This 
would help answer questions like - how often are queries retried? how often are 
the retries actually successful? If queries are constantly being retried, then 
there is probably something wrong with the cluster.

Some possible metrics to add:
 * Query retry rate (the rate at which queries are retried)
 ** This can be further divided by retry “type” - e.g. what caused the retry
 ** Potential categories would be:
 *** Queries retried due to failed RPCs
 *** Queries retried due to faulty disks
 *** Queries retried due to statestore detection of cluster membership changes
 * A metric that measures how often query retries are actually successful (e.g. 
if a query is retried, does the retry succeed, or does it just fail again)
 ** This can help users determine if query retries are actually helping, or 
just adding overhead (e.g. if retries always fail then something is probably 
wrong)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9233) Add impalad level metrics for query retries

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9233:


 Summary: Add impalad level metrics for query retries
 Key: IMPALA-9233
 URL: https://issues.apache.org/jira/browse/IMPALA-9233
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar


It would nice to have some impalad level metrics related to query retries. This 
would help answer questions like - how often are queries retried? how often are 
the retries actually successful? If queries are constantly being retried, then 
there is probably something wrong with the cluster.

Some possible metrics to add:
 * Query retry rate (the rate at which queries are retried)
 ** This can be further divided by retry “type” - e.g. what caused the retry
 ** Potential categories would be:
 *** Queries retried due to failed RPCs
 *** Queries retried due to faulty disks
 *** Queries retried due to statestore detection of cluster membership changes
 * A metric that measures how often query retries are actually successful (e.g. 
if a query is retried, does the retry succeed, or does it just fail again)
 ** This can help users determine if query retries are actually helping, or 
just adding overhead (e.g. if retries always fail then something is probably 
wrong)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (IMPALA-9232) Potential overflow in SerializeThriftMsg

2019-12-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9232:
-

Assignee: (was: Tim Armstrong)

> Potential overflow in SerializeThriftMsg
> 
>
> Key: IMPALA-9232
> URL: https://issues.apache.org/jira/browse/IMPALA-9232
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>
> https://github.com/apache/impala/blob/master/be/src/rpc/jni-thrift-util.h#L30 
> a uint32_t is passed into NewByteArray and SetByteArrayRegion which both take 
> int32_t arguments.
> I think the RETURN_ERROR_IF_EXC checks would likely catch this, but would be 
> good to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9232) Potential overflow in SerializeThriftMsg

2019-12-10 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9232:
-

 Summary: Potential overflow in SerializeThriftMsg
 Key: IMPALA-9232
 URL: https://issues.apache.org/jira/browse/IMPALA-9232
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong


https://github.com/apache/impala/blob/master/be/src/rpc/jni-thrift-util.h#L30 a 
uint32_t is passed into NewByteArray and SetByteArrayRegion which both take 
int32_t arguments.

I think the RETURN_ERROR_IF_EXC checks would likely catch this, but would be 
good to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9232) Potential overflow in SerializeThriftMsg

2019-12-10 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9232:
-

 Summary: Potential overflow in SerializeThriftMsg
 Key: IMPALA-9232
 URL: https://issues.apache.org/jira/browse/IMPALA-9232
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong


https://github.com/apache/impala/blob/master/be/src/rpc/jni-thrift-util.h#L30 a 
uint32_t is passed into NewByteArray and SetByteArrayRegion which both take 
int32_t arguments.

I think the RETURN_ERROR_IF_EXC checks would likely catch this, but would be 
good to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-9209) TestRPCException.test_end_data_stream_error is flaky

2019-12-10 Thread Thomas Tauber-Marshall (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-9209.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> TestRPCException.test_end_data_stream_error is flaky
> 
>
> Key: IMPALA-9209
> URL: https://issues.apache.org/jira/browse/IMPALA-9209
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> custom_cluster.test_rpc_exception.TestRPCException.test_end_data_stream_error 
> (from pytest)
> Error Message
> {code}
> assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)  +  where 
> 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)  +where  TestRPCException._get_num_fails of  object at 0x698c790>> =  0x698c790>._get_num_fails
> {code}
> Stacktrace
> {code}
> custom_cluster/test_rpc_exception.py:123: in test_end_data_stream_error
> self.execute_test_query("Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5")
> custom_cluster/test_rpc_exception.py:75: in execute_test_query
> assert not exception_string or self._get_num_fails(impalad) == 0
> E   assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)
> E+  where 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)
> E+where  > = 
> ._get_num_fails
> {code}
> Standard Error
> {code}
> 23:14:02 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 23:14:02 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 23:14:02 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:05 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:06 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:06 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:06 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:07 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:07 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:07 MainThread: num_known_live_backends has reached value: 3
> 23:14:07 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:07 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25001
> 23:14:07 MainThread: num_known_live_backends has reached value: 3
> 23:14:08 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:08 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25002
> 23:14:08 MainThread: num_known_live_backends has reached value: 3
> 23:14:09 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> DEBUG:impala_cluster:Found 3 impalad/1 statestored/1 catalogd process(es)
> INFO:impala_service:Getting metric: statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25010
> INFO:impala_service:Metric 'statestore.live-backends' has reached desired 
> value: 4
> DEBUG:impala_service:Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> INFO:impala_service:num_known_live_backends has reached value: 3
> DEBUG:impala_service:Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25001
> INFO:impala_service:num_known_live_backends has reached value: 3
>

[jira] [Resolved] (IMPALA-9209) TestRPCException.test_end_data_stream_error is flaky

2019-12-10 Thread Thomas Tauber-Marshall (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-9209.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> TestRPCException.test_end_data_stream_error is flaky
> 
>
> Key: IMPALA-9209
> URL: https://issues.apache.org/jira/browse/IMPALA-9209
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> custom_cluster.test_rpc_exception.TestRPCException.test_end_data_stream_error 
> (from pytest)
> Error Message
> {code}
> assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)  +  where 
> 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)  +where  TestRPCException._get_num_fails of  object at 0x698c790>> =  0x698c790>._get_num_fails
> {code}
> Stacktrace
> {code}
> custom_cluster/test_rpc_exception.py:123: in test_end_data_stream_error
> self.execute_test_query("Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5")
> custom_cluster/test_rpc_exception.py:75: in execute_test_query
> assert not exception_string or self._get_num_fails(impalad) == 0
> E   assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)
> E+  where 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)
> E+where  > = 
> ._get_num_fails
> {code}
> Standard Error
> {code}
> 23:14:02 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 23:14:02 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 23:14:02 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:05 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:06 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:06 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:06 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:07 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:07 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:07 MainThread: num_known_live_backends has reached value: 3
> 23:14:07 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:07 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25001
> 23:14:07 MainThread: num_known_live_backends has reached value: 3
> 23:14:08 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:08 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25002
> 23:14:08 MainThread: num_known_live_backends has reached value: 3
> 23:14:09 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> DEBUG:impala_cluster:Found 3 impalad/1 statestored/1 catalogd process(es)
> INFO:impala_service:Getting metric: statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25010
> INFO:impala_service:Metric 'statestore.live-backends' has reached desired 
> value: 4
> DEBUG:impala_service:Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> INFO:impala_service:num_known_live_backends has reached value: 3
> DEBUG:impala_service:Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25001
> INFO:impala_service:num_known_live_backends has reached value: 3
>

[jira] [Commented] (IMPALA-9209) TestRPCException.test_end_data_stream_error is flaky

2019-12-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992988#comment-16992988
 ] 

ASF subversion and git services commented on IMPALA-9209:
-

Commit d72fd9a99927045a5b5ef9f4e617b1b6613e1574 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d72fd9a ]

IMPALA-9209: Fix flakiness in test_end_data_stream_error

TestRPCException.execute_test_query is a helper function that is used
by tests that set an RPC debug action to repeatedly run a query until
the debug action is hit.

Previously, it required that either the query is expected to always
succeed, or it must always fail if the debug action is hit and an
expected error is provided. However, the two tests that have an
expected error, test_end_data_stream_error and
test_transmit_data_error, both set two debug actions - one that will
cause a query failure and one that won't (because we always retry
'reject too busy' errors).

If only the debug action that doesn't cause query failure is hit, the
query won't fail and 'execute_test_query' will fail on the assert that
expects that the query must fail if the action was hit. This is rare,
as both debug actions have a high probability of being hit on a given
run of the query.

The solution is to remove the requirement from the tests that the
query must fail if an expected error is provided and the debug action
is hit.

Change-Id: I499955b2d61c6b806f78e124c7ab919b242921bc
Reviewed-on: http://gerrit.cloudera.org:8080/14870
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestRPCException.test_end_data_stream_error is flaky
> 
>
> Key: IMPALA-9209
> URL: https://issues.apache.org/jira/browse/IMPALA-9209
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> custom_cluster.test_rpc_exception.TestRPCException.test_end_data_stream_error 
> (from pytest)
> Error Message
> {code}
> assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)  +  where 
> 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)  +where  TestRPCException._get_num_fails of  object at 0x698c790>> =  0x698c790>._get_num_fails
> {code}
> Stacktrace
> {code}
> custom_cluster/test_rpc_exception.py:123: in test_end_data_stream_error
> self.execute_test_query("Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5")
> custom_cluster/test_rpc_exception.py:75: in execute_test_query
> assert not exception_string or self._get_num_fails(impalad) == 0
> E   assert (not 'Debug Action: IMPALA_SERVICE_POOL:FAIL@0.5' or 8 == 0)
> E+  where 8 =  >( PID: 5846 
> (/data/jenkins/workspace/impala-cdh6.x-exhaustive/re..._SERVICE_POOL:127.0.0.1:27002:EndDataStream:FAIL@0.5
>  --default_query_options=)>)
> E+where  > = 
> ._get_num_fails
> {code}
> Standard Error
> {code}
> 23:14:02 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 23:14:02 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 23:14:02 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 23:14:02 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:05 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:05 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:06 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:06 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:06 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 23:14:07 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 23:14:07 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-116e.vpc.cloudera.com:25000
> 23:14:07 MainThread:

[jira] [Assigned] (IMPALA-9197) Hash table lookup should be read-only

2019-12-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9197:
-

Assignee: Tim Armstrong

> Hash table lookup should be read-only
> -
>
> Key: IMPALA-9197
> URL: https://issues.apache.org/jira/browse/IMPALA-9197
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> For IMPALA-9156, we need concurrent lookups to the hash table to be thread 
> safe. We are pretty close to that, except a few stats are maintained in 
> HashTable and would be modified concurrently from multiple threads.
> We should modify those  places to update the stats in HashTableCtx instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9176) Make access to null-aware partition from PartitionedHashJoinNode read-only

2019-12-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9176:
-

Assignee: Tim Armstrong

> Make access to null-aware partition from PartitionedHashJoinNode read-only
> --
>
> Key: IMPALA-9176
> URL: https://issues.apache.org/jira/browse/IMPALA-9176
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> Currently the accesses to null_aware_partition() are logically read-only 
> (since the rows and other state is not mutated) and only accesses the build 
> row when pinned, but is implemented using the built-in read iterator of 
> BufferedTupleStream. This would prevent sharing of the build side for 
> null-aware anti-join.
> We need to either allow multiple read iterators for a pinned stream, or build 
> an auxiliary structure, e.g. an array of Tuple ptrs or FlatRowPtr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9156) Share broadcast join builds between fragments

2019-12-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9156:
-

Assignee: Tim Armstrong

> Share broadcast join builds between fragments
> -
>
> Key: IMPALA-9156
> URL: https://issues.apache.org/jira/browse/IMPALA-9156
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> Following on from IMPALA-4224, which should add the logic to share a single 
> builder between multiple probe sides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9162) Incorrect redundant predicate applied to outer join

2019-12-10 Thread Aman Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-9162.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Incorrect redundant predicate applied to outer join
> ---
>
> Key: IMPALA-9162
> URL: https://issues.apache.org/jira/browse/IMPALA-9162
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: Impala 3.4.0
>
> Attachments: create.sql.txt
>
>
> Run the attached create.sql script to create the tables and view.  The 
> following query shows an incorrect redundant predicate applied to the outer 
> join.  This seems another variant of past issues such as IMPALA-7957 and 
> IMPALA-8386.  
> {noformat}
> // Has a redundant predicate as 'Other predicates' on Outer Join
> Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join 
> v.v1 on  t2.c2=v1.c3) as x
>   
> 
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>other predicates: c3 = max(c3) .<-- WRONG
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>
>  --13:EXCHANGE [HASH(t2.c2)]  
> 
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397   
>   
>
>  12:EXCHANGE [HASH(c3)]   
>
>   
>
>  05:HASH JOIN [INNER JOIN, BROADCAST] 
>
>hash predicates: c3 = max(c3)  
>
>runtime filters: RF002 <- max(c3)  
>
>row-size=16B cardinality=207   
> {noformat}
>  
> By comparison, the following query which does not have the v1.max_c3 column 
> in the SELECT list produces the correct plan:
> {noformat}
> // Does not have the redundant predicate
> Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on  
> t2.c2=v1.c3) as x
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>  --13:EXCHANGE [HASH(t2.c2)]
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397
>  12:EXCHANGE [HASH(c3)]
>  05:HASH JOIN [INNER JOIN, BROADCAST]
>hash predicates: c3 = max(c3)
>runtime filters: RF002 <- max(c3)
>row-size=16B cardinality=207
> {noformat}
> Due the redundant predicate, the first query produces wrong results. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-9231) Use simplified privilege checks for show databases

2019-12-10 Thread Vihang Karajgaonkar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992869#comment-16992869
 ] 

Vihang Karajgaonkar commented on IMPALA-9231:
-

[~stigahuang] Would you be interested to take this up since you worked on 
IMPALA-9002?

> Use simplified privilege checks for show databases
> --
>
> Key: IMPALA-9231
> URL: https://issues.apache.org/jira/browse/IMPALA-9231
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> IMPALA-9002 introduced a new flag {{simplify_check_on_show_tables}} which 
> enables reduced privilege checks for the {{show tables}} command. This 
> approach can also be used for {{show databases}} command.
> We may need to rename the flag or introduce a new flag since the current flag 
> suggests it applies for {{show tables}} command only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9162) Incorrect redundant predicate applied to outer join

2019-12-10 Thread Aman Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-9162.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Incorrect redundant predicate applied to outer join
> ---
>
> Key: IMPALA-9162
> URL: https://issues.apache.org/jira/browse/IMPALA-9162
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: Impala 3.4.0
>
> Attachments: create.sql.txt
>
>
> Run the attached create.sql script to create the tables and view.  The 
> following query shows an incorrect redundant predicate applied to the outer 
> join.  This seems another variant of past issues such as IMPALA-7957 and 
> IMPALA-8386.  
> {noformat}
> // Has a redundant predicate as 'Other predicates' on Outer Join
> Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join 
> v.v1 on  t2.c2=v1.c3) as x
>   
> 
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>other predicates: c3 = max(c3) .<-- WRONG
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>
>  --13:EXCHANGE [HASH(t2.c2)]  
> 
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397   
>   
>
>  12:EXCHANGE [HASH(c3)]   
>
>   
>
>  05:HASH JOIN [INNER JOIN, BROADCAST] 
>
>hash predicates: c3 = max(c3)  
>
>runtime filters: RF002 <- max(c3)  
>
>row-size=16B cardinality=207   
> {noformat}
>  
> By comparison, the following query which does not have the v1.max_c3 column 
> in the SELECT list produces the correct plan:
> {noformat}
> // Does not have the redundant predicate
> Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on  
> t2.c2=v1.c3) as x
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>  --13:EXCHANGE [HASH(t2.c2)]
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397
>  12:EXCHANGE [HASH(c3)]
>  05:HASH JOIN [INNER JOIN, BROADCAST]
>hash predicates: c3 = max(c3)
>runtime filters: RF002 <- max(c3)
>row-size=16B cardinality=207
> {noformat}
> Due the redundant predicate, the first query produces wrong results. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9162) Incorrect redundant predicate applied to outer join

2019-12-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992862#comment-16992862
 ] 

ASF subversion and git services commented on IMPALA-9162:
-

Commit df5c4061456abb947cec8add81b361b60c5d3ad8 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df5c406 ]

IMPALA-9162: Do not apply inferred predicate to outer joins

When the planner migrates predicates to inline views, it also creates
equivalent predicates based on the value transfer graph which is built
by transitive relationships among join conditions. These newly inferred
predicates are placed typically as 'other predicates' of an inner or
outer join.

However, for outer joins, this has the effect of adding extra predicates
in the WHERE clause which is incorrect since it may filter NULL values.
Since the original query did not have null filtering conditions in
the WHERE clause, we should not add new ones. In this fix we do the
following: during the migration of conjuncts to inline views, analyze
the predicate of type A  B and if it is an inferred predicate AND
either the left or right slots reference the output tuple of an outer
join, the inferred predicate is ignored.

Note that simple queries with combination of inner and outer joins may
not reproduce the problem.  Due to the nature of predicate inferencing,
some combination of subqueries, inner joins, outer joins is needed.  For
the query pattern, please see the example in the JIRA.

Tests:
  - Added plan tests with left and right outer joins to inline-view.test
  - One baseline plan in inline-view.test had to be updated
  - Manually ran few queries on impala shell to verify result
correctness: by checking that NULL values are being produced for outer
joins.
  - Ran regression tests on jenkins

Change-Id: Ie9521bd768c4b333069c34d5c1e11b10ea535827
Reviewed-on: http://gerrit.cloudera.org:8080/14813
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Incorrect redundant predicate applied to outer join
> ---
>
> Key: IMPALA-9162
> URL: https://issues.apache.org/jira/browse/IMPALA-9162
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Attachments: create.sql.txt
>
>
> Run the attached create.sql script to create the tables and view.  The 
> following query shows an incorrect redundant predicate applied to the outer 
> join.  This seems another variant of past issues such as IMPALA-7957 and 
> IMPALA-8386.  
> {noformat}
> // Has a redundant predicate as 'Other predicates' on Outer Join
> Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join 
> v.v1 on  t2.c2=v1.c3) as x
>   
> 
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>other predicates: c3 = max(c3) .<-- WRONG
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>
>  --13:EXCHANGE [HASH(t2.c2)]  
> 
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397   
>   
>
>  12:EXCHANGE [HASH(c3)]   
>
>   
>
>  05:HASH JOIN [INNER JOIN, BROADCAST] 
>
>hash predicates: c3 = max(c3)  
>
>runtime filters: RF002 <- max(c3)  
>
>row-size=16B cardinality=207   
> {noformat}
>  
> By comparison, the following query which does not have the v1.max_c3 column 
> in the SELECT list produces the correct plan:
> {noformat}
> // Does not have the redundant predicate
> Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on  
> t2.c2=v1.c3) as x
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>hash predicates: c3 = t2.c2
>runtime filters: RF000 <- t2.c2
>row-size=20B cardinality=397
>  --13:EXCHANGE [HASH(t2.c2)]
>00:SCAN HDFS [v.t2]
>   HDFS partitions=1/1 files=1 size=639B
>   row-size=4B cardinality=397
>  12:EXCHANGE [HASH(c3)]
>  05:HASH JOIN [INNER JOIN, BROADCAST]
>hash predicates: c3 = max(c3)
>runtime filters: RF002 <- max(c3)
>row-size=16B cardinality=207
> {noformat}
> Due the redundant predicate, the

[jira] [Created] (IMPALA-9231) Use simplified privilege checks for show databases

2019-12-10 Thread Vihang Karajgaonkar (Jira)

Vihang Karajgaonkar created IMPALA-9231:
---

 Summary: Use simplified privilege checks for show databases
 Key: IMPALA-9231
 URL: https://issues.apache.org/jira/browse/IMPALA-9231
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


IMPALA-9002 introduced a new flag {{simplify_check_on_show_tables}} which 
enables reduced privilege checks for the {{show tables}} command. This approach 
can also be used for {{show databases}} command.

We may need to rename the flag or introduce a new flag since the current flag 
suggests it applies for {{show tables}} command only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9231) Use simplified privilege checks for show databases

2019-12-10 Thread Vihang Karajgaonkar (Jira)

Vihang Karajgaonkar created IMPALA-9231:
---

 Summary: Use simplified privilege checks for show databases
 Key: IMPALA-9231
 URL: https://issues.apache.org/jira/browse/IMPALA-9231
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


IMPALA-9002 introduced a new flag {{simplify_check_on_show_tables}} which 
enables reduced privilege checks for the {{show tables}} command. This approach 
can also be used for {{show databases}} command.

We may need to rename the flag or introduce a new flag since the current flag 
suggests it applies for {{show tables}} command only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-6159) DataStreamSender should transparently handle some connection reset by peer

2019-12-10 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992849#comment-16992849
 ] 

Sahil Takiar commented on IMPALA-6159:
--

[~asherman] Michael mentioned there was some more stuff to do here, might be 
related to the linked JIRAs that are still open. [~twmarshall] do you 
understand what else needs to be done for this JIRA?

> DataStreamSender should transparently handle some connection reset by peer
> --
>
> Key: IMPALA-6159
> URL: https://issues.apache.org/jira/browse/IMPALA-6159
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> A client to server KRPC connection can become stale if the socket was closed 
> on the server side due to various reasons such as idle connection removal or 
> remote Impalad restart. Currently, the KRPC code will invoke the callback of 
> all RPCs using that stale connection with the failed status (e.g. "Connection 
> reset by peer"). DataStreamSender should pattern match against certain error 
> string (as they are mostly output from strerror()) and retry the RPC 
> transparently. This may be also be useful for KUDU-2192 which tracks the 
> effort to detect stuck connection and close them. In which case, we may also 
> want to transparently retry the RPC
> FWIW, KUDU-279 is tracking the effort to have a cleaner protocol for 
> connection teardown due to idle client connection removal on the server side. 
> However, Impala still needs to handle other reasons for a stale connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6159) DataStreamSender should transparently handle some connection reset by peer

2019-12-10 Thread Andrew Sherman (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992738#comment-16992738
 ] 

Andrew Sherman commented on IMPALA-6159:


[~stakiar] should this be closed? 

> DataStreamSender should transparently handle some connection reset by peer
> --
>
> Key: IMPALA-6159
> URL: https://issues.apache.org/jira/browse/IMPALA-6159
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> A client to server KRPC connection can become stale if the socket was closed 
> on the server side due to various reasons such as idle connection removal or 
> remote Impalad restart. Currently, the KRPC code will invoke the callback of 
> all RPCs using that stale connection with the failed status (e.g. "Connection 
> reset by peer"). DataStreamSender should pattern match against certain error 
> string (as they are mostly output from strerror()) and retry the RPC 
> transparently. This may be also be useful for KUDU-2192 which tracks the 
> effort to detect stuck connection and close them. In which case, we may also 
> want to transparently retry the RPC
> FWIW, KUDU-279 is tracking the effort to have a cleaner protocol for 
> connection teardown due to idle client connection removal on the server side. 
> However, Impala still needs to handle other reasons for a stale connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9226) Improve string allocations of the ORC scanner

2019-12-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9226:
--
Description: 
Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]

Besides the two many allocations and copying it's also bad for memory locality.

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.

  was:
Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.


> Improve string allocations of the ORC scanner
> -
>
> Key: IMPALA-9226
> URL: https://issues.apache.org/jira/browse/IMPALA-9226
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: orc
>
> Currently the ORC scanner allocates new memory for each string values (except 
> for fixed size strings):
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]
> Besides the two many allocations and copying it's also bad for memory 
> locality.
> Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
> strings in the batch: 
> [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]
> 'blob' has type DataBuffer which is movable, so Impala might be able to get 
> ownership of it. Or, at least we could copy the whole blob array instead of 
> copying the strings one-by-one.
> ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
> 1.5.5.
> ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:
> [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]
> It uses dictionary encoding for storing the values. Impala could copy/move 
> the dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9226) Improve string allocations of the ORC scanner

2019-12-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9226:
--
Description: 
Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]

Besides the too many allocations and copying it's also bad for memory locality.

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.

  was:
Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]

Besides the two many allocations and copying it's also bad for memory locality.

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.


> Improve string allocations of the ORC scanner
> -
>
> Key: IMPALA-9226
> URL: https://issues.apache.org/jira/browse/IMPALA-9226
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: orc
>
> Currently the ORC scanner allocates new memory for each string values (except 
> for fixed size strings):
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]
> Besides the too many allocations and copying it's also bad for memory 
> locality.
> Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
> strings in the batch: 
> [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]
> 'blob' has type DataBuffer which is movable, so Impala might be able to get 
> ownership of it. Or, at least we could copy the whole blob array instead of 
> copying the strings one-by-one.
> ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
> 1.5.5.
> ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:
> [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]
> It uses dictionary encoding for storing the values. Impala could copy/move 
> the dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9230) Retried runtime profile should include some information about previous query attempts

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9230:


 Summary: Retried runtime profile should include some information 
about previous query attempts
 Key: IMPALA-9230
 URL: https://issues.apache.org/jira/browse/IMPALA-9230
 Project: IMPALA
  Issue Type: Sub-task
  Components: Clients
Reporter: Sahil Takiar


We should consider adding the following to the runtime profiles of retried 
queries:
 * Include the reason why the previous query attempts failed
 * Include some basic runtime info about the previous queries:
 ** How long each failed query took to run
 ** Can consider including this in the query timeline as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9230) Retried runtime profile should include some information about previous query attempts

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9230:


 Summary: Retried runtime profile should include some information 
about previous query attempts
 Key: IMPALA-9230
 URL: https://issues.apache.org/jira/browse/IMPALA-9230
 Project: IMPALA
  Issue Type: Sub-task
  Components: Clients
Reporter: Sahil Takiar


We should consider adding the following to the runtime profiles of retried 
queries:
 * Include the reason why the previous query attempts failed
 * Include some basic runtime info about the previous queries:
 ** How long each failed query took to run
 ** Can consider including this in the query timeline as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-9124) Transparently retry queries that fail due to cluster membership changes

2019-12-10 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9124 started by Sahil Takiar.

> Transparently retry queries that fail due to cluster membership changes
> ---
>
> Key: IMPALA-9124
> URL: https://issues.apache.org/jira/browse/IMPALA-9124
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: Impala Transparent Query Retries.pdf
>
>
> Currently, if the Impala Coordinator or any Executors run into errors during 
> query execution, Impala will fail the entire query. It would improve user 
> experience to transparently retry the query for some transient, recoverable 
> errors.
> This JIRA focuses on retrying queries that would otherwise fail due to 
> cluster membership changes. Specifically, node failures that cause changes in 
> the cluster membership (currently the Coordinator cancels all queries running 
> on a node if it detects that the node is no longer part of the cluster) and 
> node blacklisting (the Coordinator blacklists a node because it detects a 
> problem with that node - can’t execute RPCs against the node). It is not 
> focused on retrying general errors (e.g. any frontend errors, 
> MemLimitExceeded exceptions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

2019-12-10 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-6788:
-
Priority: Major  (was: Critical)

> Abort ExecFInstance() RPC loop early after query failure
> 
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb4 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb4, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb4
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb8 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb8, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb8
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x7fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01222944 in impala::Promise::Get() ()
> #2  0x01220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x01221c87 in impala::Coordinator::Exec() ()
> #4  0x00c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x00c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x00bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr, bool*, 
> std::shared_ptr*) ()
> #7  0x00c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr, 
> std::shared_ptr*) ()
> #8  0x00c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

2019-12-10 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-6788:
-
Priority: Critical  (was: Major)

> Abort ExecFInstance() RPC loop early after query failure
> 
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
>  Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb4 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb4, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb4
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb8 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb8, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb8
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x7fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01222944 in impala::Promise::Get() ()
> #2  0x01220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x01221c87 in impala::Coordinator::Exec() ()
> #4  0x00c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x00c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x00bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr, bool*, 
> std::shared_ptr*) ()
> #7  0x00c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr, 
> std::shared_ptr*) ()
> #8  0x00c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9229) Link failed and retried runtime profiles

2019-12-10 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9229:
-
Summary: Link failed and retried runtime profiles  (was: Link runtime 
profiles from failed and retried queries)

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9229) Link runtime profiles from failed and retried queries

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9229:


 Summary: Link runtime profiles from failed and retried queries
 Key: IMPALA-9229
 URL: https://issues.apache.org/jira/browse/IMPALA-9229
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There should be a way for clients to link the runtime profiles from failed 
queries to all retry attempts (whether successful or not), and vice versa.

There are a few ways to do this:
 * The simplest way would be to include the query id of the retried query in 
the runtime profile of the failed query, and vice versa; users could then 
manually create a chain of runtime profiles in order to fetch all failed / 
successful attempts
 * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
profiles for the given query id + all retry attempts (or add a new Thrift call 
TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a given 
query id)
 * The Impala debug UI should include a simple way to view all the runtime 
profiles of a query (the failed attempts + all retry attempts) side by side 
(perhaps the query_profile?query_id profile should include tabs to easily 
switch between the runtime profiles of each attempt)

These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9229) Link runtime profiles from failed and retried queries

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9229:


 Summary: Link runtime profiles from failed and retried queries
 Key: IMPALA-9229
 URL: https://issues.apache.org/jira/browse/IMPALA-9229
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There should be a way for clients to link the runtime profiles from failed 
queries to all retry attempts (whether successful or not), and vice versa.

There are a few ways to do this:
 * The simplest way would be to include the query id of the retried query in 
the runtime profile of the failed query, and vice versa; users could then 
manually create a chain of runtime profiles in order to fetch all failed / 
successful attempts
 * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
profiles for the given query id + all retry attempts (or add a new Thrift call 
TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a given 
query id)
 * The Impala debug UI should include a simple way to view all the runtime 
profiles of a query (the failed attempts + all retry attempts) side by side 
(perhaps the query_profile?query_id profile should include tabs to easily 
switch between the runtime profiles of each attempt)

These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IMPALA-9228) ORC scanner could be vectorized

2019-12-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/IMPALA-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9228:
--
Description: 
The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise via virtual function calls:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.

  was:
The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise and value-by-value via virtual function calls:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.


> ORC scanner could be vectorized
> ---
>
> Key: IMPALA-9228
> URL: https://issues.apache.org/jira/browse/IMPALA-9228
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: orc
>
> The ORC scanners uses an external library to read ORC files. The library 
> reads the file contents into its own memory representation. It is a 
> vectorized representation similar to the Arrow format.
> Impala needs to convert the ORC row batch to an Impala row batch. Currently 
> the conversion happens row-wise via virtual function calls:
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]
> Instead of this approach it could work similarly to the Parquet scanner that 
> fills the columns one-by-one into a scratch batch, then evaluate the 
> conjuncts on the scratch batch. For more details see 
> HdfsParquetScanner::AssembleRows():
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]
> This way we'll need a lot less virtual function calls, also the memory 
> reads/writes will be much more localized and predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9228) ORC scanner could be vectorized

2019-12-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/IMPALA-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9228:
--
Description: 
The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise and value-by-value via virtual function calls:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.

  was:
The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise and column-by-column via a virtual function call:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.


> ORC scanner could be vectorized
> ---
>
> Key: IMPALA-9228
> URL: https://issues.apache.org/jira/browse/IMPALA-9228
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: orc
>
> The ORC scanners uses an external library to read ORC files. The library 
> reads the file contents into its own memory representation. It is a 
> vectorized representation similar to the Arrow format.
> Impala needs to convert the ORC row batch to an Impala row batch. Currently 
> the conversion happens row-wise and value-by-value via virtual function calls:
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]
> Instead of this approach it could work similarly to the Parquet scanner that 
> fills the columns one-by-one into a scratch batch, then evaluate the 
> conjuncts on the scratch batch. For more details see 
> HdfsParquetScanner::AssembleRows():
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]
> This way we'll need a lot less virtual function calls, also the memory 
> reads/writes will be much more localized and predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9228) ORC scanner could be vectorized

2019-12-10 Thread Jira

Zoltán Borók-Nagy created IMPALA-9228:
-

 Summary: ORC scanner could be vectorized
 Key: IMPALA-9228
 URL: https://issues.apache.org/jira/browse/IMPALA-9228
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise and column-by-column via a virtual function call:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9228) ORC scanner could be vectorized

2019-12-10 Thread Jira

Zoltán Borók-Nagy created IMPALA-9228:
-

 Summary: ORC scanner could be vectorized
 Key: IMPALA-9228
 URL: https://issues.apache.org/jira/browse/IMPALA-9228
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


The ORC scanners uses an external library to read ORC files. The library reads 
the file contents into its own memory representation. It is a vectorized 
representation similar to the Arrow format.

Impala needs to convert the ORC row batch to an Impala row batch. Currently the 
conversion happens row-wise and column-by-column via a virtual function call:

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/hdfs-orc-scanner.cc#L671]

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L352]

Instead of this approach it could work similarly to the Parquet scanner that 
fills the columns one-by-one into a scratch batch, then evaluate the conjuncts 
on the scratch batch. For more details see HdfsParquetScanner::AssembleRows():

[https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1077-L1088]

This way we'll need a lot less virtual function calls, also the memory 
reads/writes will be much more localized and predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9227) Test coverage for query retries when there is a network partition

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9227:


 Summary: Test coverage for query retries when there is a network 
partition
 Key: IMPALA-9227
 URL: https://issues.apache.org/jira/browse/IMPALA-9227
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar


The initial version of transparent query retries just adds coverage for 
retrying a query if an impalad crashes. Now that the Impala has an RPC fault 
injection framework (IMPALA-8138) based on debug actions, integration tests can 
introduce network partitions between two impalad processes.

Node blacklisting should cause the Impala Coordinator to blacklist the nodes 
with the network partitions (IMPALA-9137), and then transparent query retries 
should cause the query to be successfully retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9227) Test coverage for query retries when there is a network partition

2019-12-10 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9227:


 Summary: Test coverage for query retries when there is a network 
partition
 Key: IMPALA-9227
 URL: https://issues.apache.org/jira/browse/IMPALA-9227
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar


The initial version of transparent query retries just adds coverage for 
retrying a query if an impalad crashes. Now that the Impala has an RPC fault 
injection framework (IMPALA-8138) based on debug actions, integration tests can 
introduce network partitions between two impalad processes.

Node blacklisting should cause the Impala Coordinator to blacklist the nodes 
with the network partitions (IMPALA-9137), and then transparent query retries 
should cause the query to be successfully retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9226) Improve string allocations of the ORC scanner

2019-12-10 Thread Jira

Zoltán Borók-Nagy created IMPALA-9226:
-

 Summary: Improve string allocations of the ORC scanner
 Key: IMPALA-9226
 URL: https://issues.apache.org/jira/browse/IMPALA-9226
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9226) Improve string allocations of the ORC scanner

2019-12-10 Thread Jira

Zoltán Borók-Nagy created IMPALA-9226:
-

 Summary: Improve string allocations of the ORC scanner
 Key: IMPALA-9226
 URL: https://issues.apache.org/jira/browse/IMPALA-9226
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


Currently the ORC scanner allocates new memory for each string values (except 
for fixed size strings):

https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172

Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
strings in the batch: 
[https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]

'blob' has type DataBuffer which is movable, so Impala might be able to get 
ownership of it. Or, at least we could copy the whole blob array instead of 
copying the strings one-by-one.

ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
1.5.5.

ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:

[https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]

It uses dictionary encoding for storing the values. Impala could copy/move the 
dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-9222) Speed up show tables/databases if the user has access to parent db/server

2019-12-10 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992495#comment-16992495
 ] 

Csaba Ringhofer commented on IMPALA-9222:
-

https://gerrit.cloudera.org/#/c/14867/

> Speed up show tables/databases if the user has access to parent db/server
> -
>
> Key: IMPALA-9222
> URL: https://issues.apache.org/jira/browse/IMPALA-9222
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>
> Currently we always do the auth check for tables/databases individually. If 
> the user has privileges higher in the hierarchy then it is not necessary to 
> do these checks as they will all succeed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9222) Speed up show tables/databases if the user has access to parent db/server

2019-12-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9222 started by Csaba Ringhofer.
---
> Speed up show tables/databases if the user has access to parent db/server
> -
>
> Key: IMPALA-9222
> URL: https://issues.apache.org/jira/browse/IMPALA-9222
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>
> Currently we always do the auth check for tables/databases individually. If 
> the user has privileges higher in the hierarchy then it is not necessary to 
> do these checks as they will all succeed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9010) Support pre-defined mask types from Ranger UI

2019-12-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-9010:
---
Description: 
Review Hive implementation/behavior.

Redact/Partial/Hash/Nullify/Unmasked/Date
 These will be implemented as static SQL transforms in Impala

To be specifit, we need to implement 6 builtin functions:
 * mask: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java]
 * mask_hash: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskHash.java]
 * mask_first_n: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskFirstN.java]
 * mask_last_n: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskLastN.java]
 * mask_show_first_n: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowFirstN.java]
 * mask_show_last_n: 
[https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowLastN.java]

These are Hive GenericUDFs which Impala can't use. So we have to create our own 
builtin functions.

  was:
Review Hive implementation/behavior.

Redact/Partial/Hash/Nullify/Unmasked/Date
 These will be implemented as static SQL transforms in Impala


> Support pre-defined mask types from Ranger UI
> -
>
> Key: IMPALA-9010
> URL: https://issues.apache.org/jira/browse/IMPALA-9010
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> Review Hive implementation/behavior.
> Redact/Partial/Hash/Nullify/Unmasked/Date
>  These will be implemented as static SQL transforms in Impala
> To be specifit, we need to implement 6 builtin functions:
>  * mask: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java]
>  * mask_hash: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskHash.java]
>  * mask_first_n: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskFirstN.java]
>  * mask_last_n: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskLastN.java]
>  * mask_show_first_n: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowFirstN.java]
>  * mask_show_last_n: 
> [https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowLastN.java]
> These are Hive GenericUDFs which Impala can't use. So we have to create our 
> own builtin functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

48 matches

Mail list logo