[jira] [Created] (IMPALA-8116) Impala Doc: Create Impala Limitations doc

2019-01-24 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8116:
---

 Summary: Impala Doc: Create Impala Limitations doc
 Key: IMPALA-8116
 URL: https://issues.apache.org/jira/browse/IMPALA-8116
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Affects Versions: Impala 3.1.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni


Create a separate document that focuses on design limitations more than bugs. 
It could also include functional limitations like "cannot write nested types", 
etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8090) DiskIoMgrTest.SyncReadTest hits file_ != nullptr DCHECK in LocalFileReader::ReadFromPos()

2019-01-24 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8090.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> DiskIoMgrTest.SyncReadTest hits file_ != nullptr DCHECK in 
> LocalFileReader::ReadFromPos()
> -
>
> Key: IMPALA-8090
> URL: https://issues.apache.org/jira/browse/IMPALA-8090
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: David Knupp
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> *Test output*:
> {noformat}
> 45/99 Test #45: disk-io-mgr-test .***Exception: Other 43.29 
> sec
> Turning perftools heap leak checking off
> [==] Running 25 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 25 tests from DiskIoMgrTest
> [ RUN  ] DiskIoMgrTest.SingleWriter
> 19/01/16 15:57:09 INFO util.JvmPauseMonitor: Starting JVM pause monitor
> [   OK ] DiskIoMgrTest.SingleWriter (3407 ms)
> [ RUN  ] DiskIoMgrTest.InvalidWrite
> [   OK ] DiskIoMgrTest.InvalidWrite (281 ms)
> [ RUN  ] DiskIoMgrTest.WriteErrors
> [   OK ] DiskIoMgrTest.WriteErrors (235 ms)
> [ RUN  ] DiskIoMgrTest.SingleWriterCancel
> [   OK ] DiskIoMgrTest.SingleWriterCancel (1165 ms)
> [ RUN  ] DiskIoMgrTest.SingleReader
> [   OK ] DiskIoMgrTest.SingleReader (5835 ms)
> [ RUN  ] DiskIoMgrTest.SingleReaderSubRanges
> [   OK ] DiskIoMgrTest.SingleReaderSubRanges (16404 ms)
> [ RUN  ] DiskIoMgrTest.AddScanRangeTest
> [   OK ] DiskIoMgrTest.AddScanRangeTest (1210 ms)
> [ RUN  ] DiskIoMgrTest.SyncReadTest
> *** Check failure stack trace: ***
> @  0x4825dcc
> @  0x4827671
> @  0x48257a6
> @  0x4828d6d
> @  0x1af39ec
> @  0x1ae90a4
> @  0x1ac30ea
> @  0x1accad3
> @  0x1acc660
> @  0x1acbf3e
> @  0x1acb62d
> @  0x1b03671
> @  0x1f79988
> @  0x1f82b60
> @  0x1f82a84
> @  0x1f82a47
> @  0x3751579
> @   0x3ea4807850
> @   0x3ea44e894c
> Wrote minidump to 
> /data/jenkins/workspace/<...>/repos/Impala/logs/be_tests/minidumps/disk-io-mgr-test/5bbf76f7-e5d6-4ac9-bdae9d9b-065c32ec.dmp
> {noformat}
> *Error*:
> {noformat}
> Operating system: Linux
>   0.0.0 Linux 2.6.32-358.14.1.el6.centos.plus.x86_64 #1 SMP 
> Tue Jul 16 21:33:24 UTC 2013 x86_64
> CPU: amd64
>  family 6 model 45 stepping 7
>  8 CPUs
> GPU: UNKNOWN
> Crash reason:  SIGABRT
> Crash address: 0x4522fa1
> Process uptime: not available
> Thread 205 (crashed)
>  0  libc-2.12.so + 0x328e5
> rax = 0x   rdx = 0x0006
> rcx = 0x   rbx = 0x06adf9c0
> rsi = 0x0563   rdi = 0x2fa1
> rbp = 0x7f8009b8ffe0   rsp = 0x7f8009b8fc78
>  r8 = 0x7f8009b8fd00r9 = 0x0563
> r10 = 0x0008   r11 = 0x0202
> r12 = 0x06adfa40   r13 = 0x001f
> r14 = 0x06ae7384   r15 = 0x06adf9c0
> rip = 0x003ea44328e5
> Found by: given as instruction pointer in context
>  1  libc-2.12.so + 0x340c5
> rbp = 0x7f8009b8ffe0   rsp = 0x7f8009b8fc80
> rip = 0x003ea44340c5
> Found by: stack scanning
>  2  disk-io-mgr-test!boost::_bi::bind_t impala::io::DiskQueue, impala::io::DiskIoMgr*>, 
> boost::_bi::list2, 
> boost::_bi::value > >::operator()() 
> [bind_template.hpp : 20 + 0x21]
> rbp = 0x7f8009b8ffe0   rsp = 0x7f8009b8fc88
> rip = 0x01acbf3e
> Found by: stack scanning
>  3  disk-io-mgr-test!google::LogMessage::Flush() + 0x157
> rbx = 0x0007   rbp = 0x06adf980
> rsp = 0x7f8009b8fff0   rip = 0x048257a7
> Found by: call frame info
>  4  disk-io-mgr-test!google::LogMessageFatal::~LogMessageFatal() + 0xe
> rbx = 0x7f8009b90110   rbp = 0x7f8009b903f0
> rsp = 0x7f8009b90070   r12 = 0x0001
> r13 = 0x06aee8b8   r14 = 0x0c213538
> r15 = 0x0007   rip = 0x04828d6e
> Found by: call frame info
>  5  disk-io-mgr-test!impala::io::LocalFileReader::ReadFromPos(long, unsigned 
> char*, long, long*, bool*) [local-file-reader.cc : 67 + 0x10]
> rbx = 0x0001   rbp = 0x7f8009b903f0
> rsp = 0x7f8009b90090   r12 = 0x0001
> r13 = 0x06aee8b8   r14 = 0x0c213538
> r15 = 0x0007   rip = 0x01af39ed
> Found by: call frame in

[jira] [Resolved] (IMPALA-8107) Support EXEC_TIME_LIMIT_S in resource pool setting

2019-01-24 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-8107.

Resolution: Resolved

Already supported by IMPALA-2538. Close this JIRA.

> Support EXEC_TIME_LIMIT_S in resource pool setting
> --
>
> Key: IMPALA-8107
> URL: https://issues.apache.org/jira/browse/IMPALA-8107
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: admission-control
>
> Timeout limit should be different for different kinds of queries. For 
> example, resource pool for adhoc queries may set EXEC_TIME_LIMIT_S to 60s. 
> Resource pool for building pre-aggregaions or other ETL may need a larger 
> EXEC_TIME_LIMIT_S like 30 minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8115) some jenkins workers slow to spawn to to dpkg lock conflicts

2019-01-24 Thread Michael Brown (JIRA)
Michael Brown created IMPALA-8115:
-

 Summary: some jenkins workers slow to spawn to to dpkg lock 
conflicts
 Key: IMPALA-8115
 URL: https://issues.apache.org/jira/browse/IMPALA-8115
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Michael Brown


A Jenkins worker for label {{ubuntu-16.04}} took about 15 minutes to start 
doing real work. I noticed that it was retrying {{apt-get update}}:
{noformat}
++ sudo apt-get --yes install openjdk-8-jdk
E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily 
unavailable)
E: Unable to lock the administration directory (/var/lib/dpkg/), is another 
process using it?
++ date
Thu Jan 24 23:37:33 UTC 2019
++ sudo apt-get update
++ sleep 10
++ sudo apt-get --yes install openjdk-8-jdk
[etc]
{noformat}

I ssh'd into a host and saw that, yes, something else was holding onto the dpkg 
log (confirmed with lsof and not pasted here. dpkg process PID 11459 was the 
culprit)

{noformat}
root   1750  0.0  0.0   4508  1664 ?Ss   23:21   0:00 /bin/sh 
/usr/lib/apt/apt.systemd.daily
root   1804 12.3  0.1 141076 80452 ?S23:22   1:24  \_ 
/usr/bin/python3 /usr/bin/unattended-upgrade
root   3263  0.0  0.1 140960 72896 ?S23:23   0:00  \_ 
/usr/bin/python3 /usr/bin/unattended-upgrade
root  11459  0.6  0.0  45920 25184 pts/1Ss+  23:24   0:03  \_ 
/usr/bin/dpkg --status-fd 10 --unpack --auto-deconfigure 
/var/cache/apt/archives/tzdata_2018i-0ubuntu0.16.04_all.deb 
/var/cache/apt/archives/distro-info-data_0.28ubuntu0.9_all.deb 
/var/cache/apt/archives/file_1%3a5.25-2ubuntu1.1_amd64.deb 
/var/cache/apt/archives/libmagic1_1%3a5.25-2ubuntu1.1_amd64.deb 
/var/cache/apt/archives/libisc-export160_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb
 
/var/cache/apt/archives/libdns-export162_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb
 /var/cache/apt/archives/isc-dhcp-client_4.3.3-5ubuntu12.9_amd64.deb 
/var/cache/apt/archives/isc-dhcp-common_4.3.3-5ubuntu12.9_amd64.deb 
/var/cache/apt/archives/libidn11_1.32-3ubuntu1.2_amd64.deb 
/var/cache/apt/archives/libpng12-0_1.2.54-1ubuntu1.1_amd64.deb 
/var/cache/apt/archives/libtasn1-6_4.7-3ubuntu0.16.04.3_amd64.deb 
/var/cache/apt/archives/libapparmor-perl_2.10.95-0ubuntu2.10_amd64.deb 
/var/cache/apt/archives/apparmor_2.10.95-0ubuntu2.10_amd64.deb 
/var/cache/apt/archives/curl_7.47.0-1ubuntu2.11_amd64.deb 
/var/cache/apt/archives/libgssapi-krb5-2_1.13.2+dfsg-5ubuntu2.1_amd64.deb 
/var/cache/apt/archives/libkrb5-3_1.13.2+dfsg-5ubuntu2.1_amd64.deb 
/var/cache/apt/archives/libkrb5support0_1.13.2+dfsg-5ubuntu2.1_amd64.deb 
/var/cache/apt/archives/libk5crypto3_1.13.2+dfsg-5ubuntu2.1_amd64.deb 
/var/cache/apt/archives/libcurl3-gnutls_7.47.0-1ubuntu2.11_amd64.deb 
/var/cache/apt/archives/apt-transport-https_1.2.29ubuntu0.1_amd64.deb 
/var/cache/apt/archives/libicu55_55.1-7ubuntu0.4_amd64.deb 
/var/cache/apt/archives/libxml2_2.9.3+dfsg1-1ubuntu0.6_amd64.deb 
/var/cache/apt/archives/bind9-host_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/dnsutils_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/libisc160_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/libdns162_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/libisccc140_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/libisccfg140_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/liblwres141_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/libbind9-140_1%3a9.10.3.dfsg.P4-8ubuntu1.11_amd64.deb 
/var/cache/apt/archives/openssl_1.0.2g-1ubuntu4.14_amd64.deb 
/var/cache/apt/archives/ca-certificates_20170717~16.04.1_all.deb 
/var/cache/apt/archives/libasprintf0v5_0.19.7-2ubuntu3.1_amd64.deb 
/var/cache/apt/archives/gettext-base_0.19.7-2ubuntu3.1_amd64.deb 
/var/cache/apt/archives/krb5-locales_1.13.2+dfsg-5ubuntu2.1_all.deb 
/var/cache/apt/archives/libelf1_0.165-3ubuntu1.1_amd64.deb 
/var/cache/apt/archives/libglib2.0-data_2.48.2-0ubuntu4.1_all.deb 
/var/cache/apt/archives/libnuma1_2.0.11-1ubuntu1.1_amd64.deb 
/var/cache/apt/archives/libpolkit-gobject-1-0_0.105-14.1ubuntu0.4_amd64.deb 
/var/cache/apt/archives/libx11-data_2%3a1.6.3-1ubuntu2.1_all.deb 
/var/cache/apt/archives/libx11-6_2%3a1.6.3-1ubuntu2.1_amd64.deb 
/var/cache/apt/archives/openssh-sftp-server_1%3a7.2p2-4ubuntu2.6_amd64.deb 
/var/cache/apt/archives/openssh-server_1%3a7.2p2-4ubuntu2.6_amd64.deb 
/var/cache/apt/archives/openssh-client_1%3a7.2p2-4ubuntu2.6_amd64.deb 
/var/cache/apt/archives/rsync_3.1.1-3ubuntu1.2_amd64.deb 
/var/cache/apt/archives/tcpdump_4.9.2-0ubuntu0.16.04.1_amd64.deb 
/var/cache/apt/archives/wget_1.17.1-1ubuntu1.4_amd64.deb 
/var/cache/apt/archives/python3-problem-report_2.20.1-0ubuntu2.18_all.deb 
/var/cache/apt/archives/python3-apport_2.20.1-0ubuntu2.18_all.deb 
/var/cache/apt/archives/apport_2.20.1-0ubuntu2

[jira] [Created] (IMPALA-8114) Build test failure in test_breakpad.py

2019-01-24 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8114:
---

 Summary: Build test failure in test_breakpad.py
 Key: IMPALA-8114
 URL: https://issues.apache.org/jira/browse/IMPALA-8114
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Tim Armstrong


Recent builds have failed due to a failure in {{test_breakpad.py}}. Assigning 
to Tim as the person who most recently touched this file.

Test output:

{noformat}
09:04:35  ERRORS 

09:04:35 ___ ERROR at teardown of 
TestBreakpadExhaustive.test_minidump_cleanup_thread ___
09:04:35 custom_cluster/test_breakpad.py:49: in teardown_method
09:04:35 self.kill_cluster(SIGKILL)
09:04:35 custom_cluster/test_breakpad.py:80: in kill_cluster
09:04:35 self.kill_processes(processes, signal)
09:04:35 custom_cluster/test_breakpad.py:85: in kill_processes
09:04:35 process.kill(signal)
09:04:35 common/impala_cluster.py:330: in kill
09:04:35 assert 0, "No processes %s found" % self.cmd
09:04:35 E   AssertionError: No processes 
['/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/impalad',
 '-kudu_client_rpc_timeout_ms', '0', '-kudu_master_hosts', 'localhost', 
'--mem_limit=12884901888', '-logbufsecs=5', '-v=1', '-max_log_files=0', 
'-log_filename=impalad', 
'-log_dir=/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/logs/custom_cluster_tests',
 '-beeswax_port=21000', '-hs2_port=21050', '-be_port=22000', 
'-krpc_port=27000', '-state_store_subscriber_port=23000', 
'-webserver_port=25000', '-max_minidumps=2', '-logbufsecs=1', 
'-minidump_path=/tmp/tmpKaSw_w', '--default_query_options='] found
{noformat}

Distilled {{TEST-impala-custom-cluster.xml}} output:

{noformat}
-- 2019-01-23 08:00:43,585 INFO MainThread: Found 3 impalad/1 statestored/1 
catalogd process(es)
…
-- 2019-01-23 08:00:43,667 INFO MainThread: Killing: 
/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/statestored
 -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored 
-log_dir=/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/logs/custom_cluster_tests
 -max_minidumps=2 -logbufsecs=1 -minidump_path=/tmp/tmpKaSw_w (PID: 16809) with 
signal 10
-- 2019-01-23 08:00:43,692 INFO MainThread: Found 6 impalad/1 statestored/1 
catalogd process(es)
...
E   AssertionError: No processes 
['/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/impalad
{noformat}

Notice that the main thread appaars to be killing statestore, but fails to kill 
impalad. Notice that a message appears that says that all impalads are running 
in the midst of the code that tries to shut down the cluster. Is this test 
multi-threaded? Is there more than one “main thread” Are these main threads 
working at cross purposes? What recent change may have caused this?

Also, looks like the script is sending signal 10 (SIGUSR1) while the statestore 
(in its log) says it got a SIGTERM (15):

{noformat}
I0123 08:00:44.086009 16868 thrift-client.cc:78] Couldn't open transport for 
impala-ec2-centoCaught signal: SIGTERM. Daemon will exit.
{noformat}

Not terribly familiar with this area of the product, so bumping it over to the 
BE team.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8113) test_aggregation and test_avro_primitive_in_list fail in S3

2019-01-24 Thread Michael Brown (JIRA)
Michael Brown created IMPALA-8113:
-

 Summary: test_aggregation and test_avro_primitive_in_list fail in 
S3
 Key: IMPALA-8113
 URL: https://issues.apache.org/jira/browse/IMPALA-8113
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Michael Brown
Assignee: Michael Brown


Likely more victims of our infra in S3.
{noformat}
query_test/test_aggregation.py:138: in test_aggregation
result = self.execute_query(query, vector.get_value('exec_option'))
common/impala_test_suite.py:597: in wrapper
return function(*args, **kwargs)
common/impala_test_suite.py:628: in execute_query
return self.__execute_query(self.client, query, query_options)
common/impala_test_suite.py:695: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:174: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:182: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:359: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:380: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error: Error reading from HDFS file: 
s3a://impala-test-uswest2-1/test-warehouse/alltypesagg_parquet/year=2010/month=1/day=8/5642b2da93dae1ad-494132e5_592013737_data.0.parq
E   Error(255): Unknown error 255
E   Root cause: SdkClientException: Data read has a different length than the 
expected: dataLength=0; expectedLength=45494; includeSkipped=true; 
in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
resetCount=0
{noformat}

{noformat}
query_test/test_nested_types.py:263: in test_avro_primitive_in_list
"AvroPrimitiveInList.parquet", vector)
query_test/test_nested_types.py:287: in __test_primitive_in_list
result = self.execute_query("select item from %s.col1" % full_name, qopts)
common/impala_test_suite.py:597: in wrapper
return function(*args, **kwargs)
common/impala_test_suite.py:628: in execute_query
return self.__execute_query(self.client, query, query_options)
common/impala_test_suite.py:695: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:174: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:182: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:359: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:380: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error: Failed to open HDFS file 
s3a://impala-test-uswest2-1/test-warehouse/test_avro_primitive_in_list_38f182c4.db/AvroPrimitiveInList/AvroPrimitiveInList.parquet
E   Error(2): No such file or directory
E   Root cause: FileNotFoundException: No such file or directory: 
s3a://impala-test-uswest2-1/test-warehouse/test_avro_primitive_in_list_38f182c4.db/AvroPrimitiveInList/AvroPrimitiveInList.parquet
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8112) test_cancel_select with debug action failed with unexpected error

2019-01-24 Thread Michael Brown (JIRA)
Michael Brown created IMPALA-8112:
-

 Summary: test_cancel_select with debug action failed with 
unexpected error
 Key: IMPALA-8112
 URL: https://issues.apache.org/jira/browse/IMPALA-8112
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.2.0
Reporter: Michael Brown
Assignee: Andrew Sherman


Stacktrace
{noformat}
query_test/test_cancellation.py:241: in test_cancel_select
self.execute_cancel_test(vector)
query_test/test_cancellation.py:213: in execute_cancel_test
assert 'Cancelled' in str(thread.fetch_results_error)
E   assert 'Cancelled' in "ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: Unable to open Kudu table: 
Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected 
(error 107)\n"
E+  where "ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: Unable to open Kudu table: 
Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected 
(error 107)\n" = str(ImpalaBeeswaxException())
E+where ImpalaBeeswaxException() = .fetch_results_error
{noformat}

Standard Error
{noformat}
SET 
client_identifier=query_test/test_cancellation.py::TestCancellationParallel::()::test_cancel_select[protocol:beeswax|table_format:kudu/none|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'debug_action;
-- executing against localhost:21000
use tpch_kudu;

-- 2019-01-18 17:50:03,100 INFO MainThread: Started query 
4e4b3ab4cc7d:11efc3f5
SET 
client_identifier=query_test/test_cancellation.py::TestCancellationParallel::()::test_cancel_select[protocol:beeswax|table_format:kudu/none|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'debug_action;
SET batch_size=0;
SET num_nodes=0;
SET disable_codegen_rows_threshold=0;
SET disable_codegen=False;
SET abort_on_error=1;
SET cpu_limit_s=10;
SET debug_action=0:GETNEXT:WAIT|COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL;
SET exec_single_node_rows_threshold=0;
SET buffer_pool_limit=0;
-- executing async: localhost:21000
select l_returnflag from lineitem;

-- 2019-01-18 17:50:03,139 INFO MainThread: Started query 
fa4ddb9e62a01240:54c86ad
SET 
client_identifier=query_test/test_cancellation.py::TestCancellationParallel::()::test_cancel_select[protocol:beeswax|table_format:kudu/none|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'debug_action;
-- connecting to: localhost:21000
-- fetching results from: 
-- getting state for operation: 
-- canceling operation: 
-- 2019-01-18 17:50:08,196 INFO Thread-4: Starting new HTTP connection (1): 
localhost
-- closing query for operation handle: 

{noformat}

[~asherman] please take a look since it looks like you touched code around this 
area last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8111) Document workaround for some authentication issues with KRPC

2019-01-24 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8111:
--

 Summary: Document workaround for some authentication issues with 
KRPC
 Key: IMPALA-8111
 URL: https://issues.apache.org/jira/browse/IMPALA-8111
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Affects Versions: Impala 3.1.0, Impala 2.12.0
Reporter: Michael Ho
Assignee: Alex Rodoni


There have been complaints from users about not being able to use Impala after 
upgrading to Impala version with KRPC enabled due to authentication issues. 
Please document them in the known issues or best practice guide.

1. https://issues.apache.org/jira/browse/IMPALA-7585:
 *Symptoms*: When using Impala with LDAP enabled, a user may hit the following:
{noformat}
Not authorized: Client connection negotiation failed: client connection to 
127.0.0.1:27000: SASL(-1): generic failure: All-whitespace username.
{noformat}
*Root cause*: The following sequence can lead to the user "impala" not being 
created in /etc/passwd.
{quote}time 1: no impala in LDAP; things get installed; impala created in 
/etc/passwd
 time 2: impala added to LDAP
 time 3: new machine added
{quote}
*Workaround*:
 - Manually edit /etc/passwd to add the impala user
 - Upgrade to a version of Impala with the patch IMPALA-7585

2. https://issues.apache.org/jira/browse/IMPALA-7298
 *Symptoms*: When running with Kerberos enabled, a user may hit the following 
error:
{noformat}
WARNINGS: TransmitData() to X.X.X.X:27000 failed: Not authorized: Client 
connection negotiation failed: client connection to X.X.X.X:27000: Server 
impala/x.x@vpc.cloudera.com not found in Kerberos database
{noformat}
*Root cause*:
 KrpcDataStreamSender passes a resolved IP address when creating a proxy. 
Instead, we should pass both the resolved address and the hostname when 
creating the proxy so that we won't end up using the IP address as the hostname 
in the Kerberos principal.

*Workaround*:
 - Set rdns=true in /etc/krb5.conf
 - Upgrade to a version of Impala with the fix of IMPALA-7298

3. https://issues.apache.org/jira/browse/KUDU-2198
 *Symptoms*: When running with Kerberos enabled, a user may hit the following 
error message where  is some random string which doesn't match 
the primary in the Kerberos principal
{noformat}
WARNINGS: TransmitData() to X.X.X.X:27000 failed: Remote error: Not authorized: 
{username='', principal='impala/redacted'} is not allowed to 
access DataStreamService
{noformat}
*Root cause*:
 Due to system "auth_to_local" mapping, the principal may be mapped to some 
local name.

*Workaround*:
 - Start Impala with the flag {{--use_system_auth_to_local=false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7832) Support IF NOT EXISTS in alter table add columns

2019-01-24 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-7832.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Support IF NOT EXISTS in alter table add columns
> 
>
> Key: IMPALA-7832
> URL: https://issues.apache.org/jira/browse/IMPALA-7832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 3.2.0
>
>
> alter table  add [if not exists] columns (  [,  
> ...])
> would add the column only if a column of the same name does not already exist
> Probably worth checking out what other databases do in different situations, 
> eg. if the column already exists but with a different type, if "replace" is 
> used instead of "add", etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8110) Parquet stat filtering does not handle narrowed int types correctly

2019-01-24 Thread Csaba Ringhofer (JIRA)
Csaba Ringhofer created IMPALA-8110:
---

 Summary: Parquet stat filtering does not handle narrowed int types 
correctly
 Key: IMPALA-8110
 URL: https://issues.apache.org/jira/browse/IMPALA-8110
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Impala can read int32 Parquet columns as tiny/smallint SQL columns. If the 
value does not fit into the 8/16 bit signed int's range, the value will 
overflow, e.g writing 128 as int32 and then rereading it as int8 will return 
-128. This is normal as far as I understand, but min/max stat filtering does 
not handle this case correctly:

create table tnarrow (i int) stored as parquet;
insert into tnarrow values (1), (201); 
alter table tnarrow change column i i tinyint;
set PARQUET_READ_STATISTICS=0;
select * from tnarrow where i < 0;
-> returns 1 row: -56
set PARQUET_READ_STATISTICS=1;
-> returns 0 row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8109) Impala cannot read the gzip files bigger than 2 GB

2019-01-24 Thread hakki (JIRA)
hakki created IMPALA-8109:
-

 Summary: Impala cannot read the gzip files bigger than 2 GB
 Key: IMPALA-8109
 URL: https://issues.apache.org/jira/browse/IMPALA-8109
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.12.0
Reporter: hakki


When querying a partition containing gzip files, the query fails with the error 
below: 
WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: 
hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz: 
Error(255): Unknown error 255
Root cause: EOFException: Cannot seek to negative offset

hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz file has a 
size of bigger than 2 GB (approx: 2.4 GB)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8108) Impala query returns TIMESTAMP values in different types

2019-01-24 Thread Robbie Zhang (JIRA)
Robbie Zhang created IMPALA-8108:


 Summary: Impala query returns TIMESTAMP values in different types
 Key: IMPALA-8108
 URL: https://issues.apache.org/jira/browse/IMPALA-8108
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Robbie Zhang


When a timestamp has a .000 or .00 or .0 (when fraction value is 
zeros) the timestamp is displayed with no fraction of second. For example:
{code:java}
select cast(ts as timestamp) from 
 (values 
 ('2019-01-11 10:40:18' as ts),
 ('2019-01-11 10:40:19.0'),
 ('2019-01-11 10:40:19.00'), 
 ('2019-01-11 10:40:19.000'),
 ('2019-01-11 10:40:19.'),
 ('2019-01-11 10:40:19.0'),
 ('2019-01-11 10:40:19.00'),
 ('2019-01-11 10:40:19.000'),
 ('2019-01-11 10:40:19.'),
 ('2019-01-11 10:40:19.0'),
 ('2019-01-11 10:40:19.1')
 ) t;{code}
The output is:
{code:java}
+---+
|cast(ts as timestamp)|
+---+
|2019-01-11 10:40:18|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19|
|2019-01-11 10:40:19.1|
+---+
{code}

As we can see, values of the same column are returned in two different types. 
The inconsistency breaks some downstream use cases. 

The reason is that impala uses function 
boost::posix_time::to_simple_string(time_duration) to convert timestamp to a 
string and to_simple_string() remove fractional seconds if they are all zeros. 
Perhaps we can append ".0" if the length of the string is 8 (HH:MM:SS).

For now we can work around it by using function from_timestamp(ts, '-mm-dd 
hh:mm.ss.s') to unify the output (convert to string), or using function 
millisecond(ts) to get fractional seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8107) Support EXEC_TIME_LIMIT_S in resource pool setting

2019-01-24 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-8107:
--

 Summary: Support EXEC_TIME_LIMIT_S in resource pool setting
 Key: IMPALA-8107
 URL: https://issues.apache.org/jira/browse/IMPALA-8107
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


Timeout limit should be different for different kinds of queries. For example, 
resource pool for adhoc queries may set EXEC_TIME_LIMIT_S to 60s. Resource pool 
for building pre-aggregaions or other ETL may need a larger EXEC_TIME_LIMIT_S 
like 30 minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)