[jira] [Resolved] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()

2019-04-29 Thread Daniel Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8381.
---
Resolution: Done

Implemented and merged.

> Remove branch from ParquetPlainEncoder::Decode()
> 
>
> Key: IMPALA-8381
> URL: https://issues.apache.org/jira/browse/IMPALA-8381
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8467) ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds

2019-05-03 Thread Daniel Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8467.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds
> --
>
> Key: IMPALA-8467
> URL: https://issues.apache.org/jira/browse/IMPALA-8467
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> This is an example of the logged failures:
> {code:java}
> 00:57:35.147 15/106 Test #15: parquet-plain-test ...***Failed 
> 0.48 sec
> 00:57:35.147 [==] Running 4 tests from 1 test case.
> 00:57:35.147 [--] Global test environment set-up.
> 00:57:35.148 [--] 4 tests from PlainEncoding
> 00:57:35.148 [ RUN ] PlainEncoding.Basic
> 00:57:35.148 =
> 00:57:35.148 ==1922==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow 
> on address 0x7ffe328ee44c at pc 0x017c07bc bp 0x7ffe328ee2f0 sp 
> 0x7ffe328edaa0
> 00:57:35.148 READ of size 16 at 0x7ffe328ee44c thread T0
> 00:57:35.148 #0 0x17c07bb in __asan_memcpy 
> /mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:466
> 00:57:35.149 #1 0x1837a26 in void 
> impala::ParquetPlainEncoder::DecodeNoBoundsCheck (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:332:3
> 00:57:35.149 #2 0x1837a26 in int 
> impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:223
> 00:57:35.150 #3 0x1837216 in void 
> impala::TestTypeWidening (parquet::Type::type)3>(impala::TimestampValue const&, int) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:115:22
> 00:57:35.150 #4 0x18122f7 in impala::PlainEncoding_Basic_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:155:3
> 00:57:35.151 #5 0x4fa6142 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4fa6142)
> 00:57:35.151 #6 0x4f9d909 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9d909)
> 00:57:35.152 #7 0x4f9da57 in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9da57)
> 00:57:35.152 #8 0x4f9db34 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9db34)
> 00:57:35.153 #9 0x4f9edb7 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9edb7)
> 00:57:35.153 #10 0x4f9f092 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9f092)
> 00:57:35.153 #11 0x181655f in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:491:1
> 00:57:35.154 #12 0x7ff7a10b2c04 in __libc_start_main 
> (/lib64/libc.so.6+0x21c04)
> 00:57:35.154 #13 0x17069d6 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x17069d6)
> 00:57:35.154
> 00:57:35.154 Address 0x7ffe328ee44c is located in stack of thread T0 at 
> offset 332 in frame
> 00:57:35.154 #0 0x18378df in int 
> impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:208
> 00:57:35.155
> 00:57:35.155 This frame has 4 object(s):
> 00:57:35.155 [32, 40) 'ref.tmp.i' (line 327)
> 00:57:35.155 [64, 68) 'ref.tmp2.i' (line 327)
> 00:57:35.155 [80, 96) 'ref.tmp5.i' (line 327)
> 00:57:35.155 [112, 120) 'ref.tmp6.i' (line 327) <== Memory access at offset 
> 332 overflows this variable
> 00:57:35.155 HINT: this may be a false positive if your program uses some 

[jira] [Created] (IMPALA-8710) Increase allowed bit width to 64 for bit packing

2019-06-26 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8710:
-

 Summary: Increase allowed bit width to 64 for bit packing
 Key: IMPALA-8710
 URL: https://issues.apache.org/jira/browse/IMPALA-8710
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Increase the allowed bit width for bit packing and bit unpacking to 64
 bits. This is needed to support Parquet Delta Encoding.
 
Also add new methods to BitWriter and BatchedBitReader handling Uleb and
 ZigZag integers for 64 bits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8726) Autovectorisation leads to worse performance in bit unpacking

2019-06-28 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8726:
-

 Summary: Autovectorisation leads to worse performance in bit 
unpacking
 Key: IMPALA-8726
 URL: https://issues.apache.org/jira/browse/IMPALA-8726
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
 Attachments: no_vector.png

The compiler (GCC 4.9.2) autovectorises bit unpacking for bit widths 1, 2, 4 
and 8 (function BitPacking::UnpackValues), but this leads to actually worse 
performance (see the attached graph). We should consider whether it is worth 
disabling autovectorisation for bit unpacking, but future compiler versions may 
do a better job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8710) Increase allowed bit width to 64 for bit packing

2019-07-05 Thread Daniel Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8710.
---
Resolution: Implemented

> Increase allowed bit width to 64 for bit packing
> 
>
> Key: IMPALA-8710
> URL: https://issues.apache.org/jira/browse/IMPALA-8710
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Increase the allowed bit width for bit packing and bit unpacking to 64
>  bits. This is needed to support Parquet Delta Encoding.
>  
> Also add new methods to BitWriter and BatchedBitReader handling Uleb and
>  ZigZag integers for 64 bits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8741) Speed up bit unpacking by vectorisation

2019-07-05 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8741:
-

 Summary: Speed up bit unpacking by vectorisation
 Key: IMPALA-8741
 URL: https://issues.apache.org/jira/browse/IMPALA-8741
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Using compiler intrinsics for SIMD and bit manipulation instructions (AVX, AVX2 
and BMI2), we can speed up bit unpacking by a factor of about 2 to 8 depending 
on bit width, at most 16.

We need to take care to check that the required instructions are supported by 
the CPU the impalad is running on and fall back to the scalar implementation if 
not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8796) Add unit tests to UnpackAndDecodeValues

2019-07-26 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8796:
-

 Summary: Add unit tests to UnpackAndDecodeValues
 Key: IMPALA-8796
 URL: https://issues.apache.org/jira/browse/IMPALA-8796
 Project: IMPALA
  Issue Type: Test
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


BitPacking::UnpackAndDecodeValues has no unit tests in bit-packing-test.cc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8833) Check failed: bit_width <= sizeof(T) * 8 (40 vs. 32) in BatchedBitReader::UnpackBatch()

2019-08-07 Thread Daniel Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8833.
---
Resolution: Fixed

> Check failed: bit_width <= sizeof(T) * 8 (40 vs. 32)  in 
> BatchedBitReader::UnpackBatch()
> 
>
> Key: IMPALA-8833
> URL: https://issues.apache.org/jira/browse/IMPALA-8833
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build, crash, flaky
>
> {noformat}
> F0801 21:24:10.571285 15993 bit-stream-utils.inline.h:126] 
> d04ba69d5da8ffd1:a9045b820001] Check failed: bit_width <= sizeof(T) * 8 
> (40 vs. 32) 
> *** Check failure stack trace: ***
> @  0x52f63ac  google::LogMessage::Fail()
> @  0x52f7c51  google::LogMessage::SendToLog()
> @  0x52f5d86  google::LogMessage::Flush()
> @  0x52f934d  google::LogMessageFatal::~LogMessageFatal()
> @  0x2b265b5  impala::BatchedBitReader::UnpackBatch<>()
> @  0x2ae8623  impala::RleBatchDecoder<>::FillLiteralBuffer()
> @  0x2b2cadb  impala::RleBatchDecoder<>::DecodeLiteralValues<>()
> @  0x2b27bfb  impala::DictDecoder<>::DecodeNextValue()
> @  0x2b16fed  
> impala::ScalarColumnReader<>::ReadSlotsNoConversion()
> @  0x2ac7252  impala::ScalarColumnReader<>::ReadSlots()
> @  0x2a76cef  
> impala::ScalarColumnReader<>::MaterializeValueBatchRepeatedDefLevel()
> @  0x2a58faa  impala::ScalarColumnReader<>::ReadValueBatch<>()
> @  0x2a20e8e  
> impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch()
> @  0x29b189c  impala::HdfsParquetScanner::AssembleRows()
> @  0x29ac6de  impala::HdfsParquetScanner::GetNextInternal()
> @  0x29aa656  impala::HdfsParquetScanner::ProcessSplit()
> @  0x249172d  impala::HdfsScanNode::ProcessSplit()
> @  0x2490902  impala::HdfsScanNode::ScannerThread()
> @  0x248fc8b  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x2492253  
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/6915
> Log lines around the failure:
> {noformat}
> [gw5] PASSED 
> query_test/test_scanners.py::TestParquet::test_bad_compression_codec[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: parquet/none]
> query_test/test_nested_types.py::TestMaxNestingDepth::test_load_hive_table[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> query_test/test_scanners.py::TestParquet::test_bad_compression_codec[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> [gw1] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q7[protocol: 
> beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q8[protocol: 
> beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> [gw1] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q8[protocol: 
> beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q10a[protocol: 
> beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> [gw10] PASSED 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_decimal_tbl[protocol:
>  bee

[jira] [Created] (IMPALA-8843) Restrict bit unpacking to unsigned integer types

2019-08-07 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8843:
-

 Summary: Restrict bit unpacking to unsigned integer types
 Key: IMPALA-8843
 URL: https://issues.apache.org/jira/browse/IMPALA-8843
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


Restrict bit unpacking to the unsigned integer types uint8_t, uint16_t, 
uint32_t and uint64_t. It is straightforward how to unpack to these types and 
less so with signed types. Instead of bool, we can use uint8_t and possibly 
cast it to bool.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8840) Check failed: num_bytes <= sizeof(T) (5 vs. 4)

2019-08-08 Thread Daniel Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8840.
---
Resolution: Fixed

> Check failed: num_bytes <= sizeof(T) (5 vs. 4) 
> ---
>
> Key: IMPALA-8840
> URL: https://issues.apache.org/jira/browse/IMPALA-8840
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Xiaomeng Zhang
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build, crash
>
> Not sure if this is due to same issue as 
> https://issues.apache.org/jira/browse/IMPALA-8833#, the error message is a 
> little different.
> {code:java}
> F0805 18:48:08.737411 5488 bit-stream-utils.inline.h:173] 
> 284731e5d1aad693:05c883020001] Check failed: num_bytes <= sizeof(T) (8 
> vs. 4)
> *** Check failure stack trace: ***
> @ 0x52fb9bc google::LogMessage::Fail()
> @ 0x52fd261 google::LogMessage::SendToLog()
> @ 0x52fb396 google::LogMessage::Flush()
> @ 0x52fe95d google::LogMessageFatal::~LogMessageFatal()
> @ 0x2b2b867 impala::BatchedBitReader::GetBytes<>()
> @ 0x2aeda65 impala::RleBatchDecoder<>::NextCounts()
> @ 0x2a82896 impala::RleBatchDecoder<>::NextNumRepeats()
> @ 0x2b1927f impala::ScalarColumnReader<>::ReadSlotsNoConversion()
> @ 0x2ac7c2c impala::ScalarColumnReader<>::ReadSlots()
> @ 0x2a7b861 
> impala::ScalarColumnReader<>::MaterializeValueBatchRepeatedDefLevel()
> @ 0x2a5b3b0 impala::ScalarColumnReader<>::ReadValueBatch<>()
> @ 0x2a256a4 impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch()
> @ 0x29b6eb6 impala::HdfsParquetScanner::AssembleRows()
> @ 0x29b1cf8 impala::HdfsParquetScanner::GetNextInternal()
> @ 0x29afc70 impala::HdfsParquetScanner::ProcessSplit()
> @ 0x2494bc3 impala::HdfsScanNode::ProcessSplit()
> @ 0x2493d98 impala::HdfsScanNode::ScannerThread()
> @ 0x2493121 
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x24956e9 
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x1ea0241 boost::function0<>::operator()()
> @ 0x23de77a impala::Thread::SuperviseThread()
> @ 0x23e6afe boost::_bi::list5<>::operator()<>()
> @ 0x23e6a22 boost::_bi::bind_t<>::operator()()
> @ 0x23e69e5 boost::detail::thread_data<>::run()
> @ 0x4224819 thread_proxy
> @ 0x7fc1818c5e24 start_thread
> @ 0x7fc17e01f34c __clone
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8846) Undefined behaviour in RleEncoder::Put

2019-08-08 Thread Daniel Becker (JIRA)
Daniel Becker created IMPALA-8846:
-

 Summary: Undefined behaviour in RleEncoder::Put
 Key: IMPALA-8846
 URL: https://issues.apache.org/jira/browse/IMPALA-8846
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


On line 
[https://github.com/apache/impala/blob/4000da35be69e469500f5f11e0e5fdec119cf5c7/be/src/util/rle-encoding.h#L346,]
 we test repeat_count_ <= std::numeric_limits::max(), which is always 
true (repeat_count_ is an int), then we increment repeat_count which could be 
std::numeric_limits::max() and overflow, which is undefined behaviour 
for signed integers.

 

We should either change <= to < or if we think that this never happens, remove 
the misleading check.

If we correct the check, it may lead to some (probably small) performance 
regression because the compiler could have optimised this out.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8846) Undefined behaviour in RleEncoder::Put

2019-08-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8846.
---
Resolution: Fixed

> Undefined behaviour in RleEncoder::Put
> --
>
> Key: IMPALA-8846
> URL: https://issues.apache.org/jira/browse/IMPALA-8846
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Attachments: original.txt, with_check.txt
>
>
> On line 
> [https://github.com/apache/impala/blob/4000da35be69e469500f5f11e0e5fdec119cf5c7/be/src/util/rle-encoding.h#L346,]
>  we test repeat_count_ <= std::numeric_limits::max(), which is 
> always true (repeat_count_ is an int), then we increment repeat_count which 
> could be std::numeric_limits::max() and overflow, which is undefined 
> behaviour for signed integers.
>  
> We should either change <= to < or if we think that this never happens, 
> remove the misleading check.
> If we correct the check, it may lead to some (probably small) performance 
> regression because the compiler could have optimised this out.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8796) Add unit tests to UnpackAndDecodeValues

2019-08-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8796.
---
Resolution: Fixed

> Add unit tests to UnpackAndDecodeValues
> ---
>
> Key: IMPALA-8796
> URL: https://issues.apache.org/jira/browse/IMPALA-8796
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Minor
>
> BitPacking::UnpackAndDecodeValues has no unit tests in bit-packing-test.cc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8710) Increase allowed bit width to 64 for bit packing

2019-08-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8710.
---
Resolution: Fixed

> Increase allowed bit width to 64 for bit packing
> 
>
> Key: IMPALA-8710
> URL: https://issues.apache.org/jira/browse/IMPALA-8710
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Increase the allowed bit width for bit packing and bit unpacking to 64
>  bits. This is needed to support Parquet Delta Encoding.
>  
> Also add new methods to BitWriter and BatchedBitReader handling Uleb and
>  ZigZag integers for 64 bits.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8843) Restrict bit unpacking to unsigned integer types

2019-08-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8843.
---
Resolution: Fixed

> Restrict bit unpacking to unsigned integer types
> 
>
> Key: IMPALA-8843
> URL: https://issues.apache.org/jira/browse/IMPALA-8843
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Minor
>
> Restrict bit unpacking to the unsigned integer types uint8_t, uint16_t, 
> uint32_t and uint64_t. It is straightforward how to unpack to these types and 
> less so with signed types. Instead of bool, we can use uint8_t and possibly 
> cast it to bool.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails

2019-10-31 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-9111:
-

 Summary: Sorting 'Decimal16Value's with codegen enabled but 
codegen optimizations disabled fails
 Key: IMPALA-9111
 URL: https://issues.apache.org/jira/browse/IMPALA-9111
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker


Starting the Impala cluster with

```
bin/start-impala-cluster.py --impalad_args="-disable_optimization_passes"
```

the following query fails and Impala crashes:

```

SELECT d28_1
FROM functional.decimal_rtf_tbl ORDER BY d28_1;

```

This error happens if the inlining pass in OptimizeModule in 
be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens 
with decimals that need to be stored on 16 bytes. Maybe it is some ABI 
incompatibility with Decimal16Value.

Stack trace:
#0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x7fda6e64002a in __GI_abort () at abort.c:89
#2 0x7fda71707149 in os::abort(bool) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#3 0x7fda718bad27 in VMError::report_and_die() () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#4 0x7fda71710e4f in JVM_handle_linux_signal () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6 
#7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, 
impala::ColumnType const&) ()
#8 0x7fd9c3438e25 in Compare ()
#9 0x02a26293 in impala::TupleRowComparator::Compare 
(rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at 
be/src/util/tuple-row-compare.h:98
#10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, 
this=0x1284e480) at be/src/util/tuple-row-compare.h:107
#11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, 
rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72
#12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree 
(this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at 
be/src/runtime/sorter-ir.cc:214
#13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206
#14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165
#15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, 
run=0x13974da0) at be/src/runtime/sorter.cc:755
#16 0x02a18e27 in impala::Sorter::SortCurrentInputRun (this=0x1284e3c0) 
at be/src/runtime/sorter.cc:956
#17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at 
be/src/runtime/sorter.cc:892
#18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:187
#19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:90
#20 0x020f289a in impala::FragmentInstanceState::Open (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:348
#21 0x020ef54c in impala::FragmentInstanceState::Exec (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:84
#22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, 
fis=0xe0571e0) at be/src/runtime/query-state.cc:650
#23 0x02101268 in impala::QueryStateoperator()(void) 
const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558
#24 0x02104c7d in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
 at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
#25 0x01f04b46 in boost::function0::operator() 
(this=0x7fd9c3c4bca0) at 
toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
#26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (Python Exception  No type named class std::basic_string, std::allocator >::_Rep.: 
name=, Python Exception  No type named class 
std::basic_string, std::allocator >::_Rep.: 
category=, functor=..., parent_thread_info=0x7fd9c4c4d950, 
 thread_started=0x7fd9c4c4c8f0) at be/src/util/thread.cc:360
#27 0x02483e81 in boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0xd3857c0, 
 f=@0xd3857b8: 0x247b796 , impala::ThreadDebugInfo con

[jira] [Created] (IMPALA-9394) ASAN crash in exhaustive test

2020-02-18 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-9394:
-

 Summary: ASAN crash in exhaustive test
 Key: IMPALA-9394
 URL: https://issues.apache.org/jira/browse/IMPALA-9394
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
 Attachments: impalad.ERROR

In ASAN builds running the below test crashes Impala:
{code:java}
./bin/impala-py.test 
tests/query_test/test_tablesample.py::TestTableSample::test_tablesample 
--exploration_strategy=exhaustive{code}
The crash happens at 
{code:java}
table_format: text/lzo/block{code}
See the attachment for the error log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9394) ASAN crash in exhaustive test

2020-03-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9394.
---
Resolution: Fixed

> ASAN crash in exhaustive test
> -
>
> Key: IMPALA-9394
> URL: https://issues.apache.org/jira/browse/IMPALA-9394
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Priority: Major
> Attachments: impalad.ERROR
>
>
> In ASAN builds running the below test crashes Impala:
> {code:java}
> ./bin/impala-py.test 
> tests/query_test/test_tablesample.py::TestTableSample::test_tablesample 
> --exploration_strategy=exhaustive{code}
> The crash happens at 
> {code:java}
> table_format: text/lzo/block{code}
> See the attachment for the error log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IMPALA-8741) Speed up bit unpacking by vectorisation

2020-06-05 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker closed IMPALA-8741.
-
Resolution: Abandoned

> Speed up bit unpacking by vectorisation
> ---
>
> Key: IMPALA-8741
> URL: https://issues.apache.org/jira/browse/IMPALA-8741
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Attachments: demo3.png
>
>
> Using compiler intrinsics for SIMD and bit manipulation instructions (AVX, 
> AVX2 and BMI2), we can speed up bit unpacking by a factor of about 2 to 8 
> depending on bit width, at most 16.
> We need to take care to check that the required instructions are supported by 
> the CPU the impalad is running on and fall back to the scalar implementation 
> if not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9747) More fine-grained codegen for text file scanners

2020-06-29 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9747.
---
Resolution: Implemented

> More fine-grained codegen for text file scanners
> 
>
> Key: IMPALA-9747
> URL: https://issues.apache.org/jira/browse/IMPALA-9747
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Major
>
> Currently if  the materialization of any column cannot be codegend for some 
> reason (e.g. it is CHAR(N)), then the whole codegen is cancelled for the text 
> scanner, see:
> https://github.com/apache/impala/blob/b5805de3e65fd1c7154e4169b323bb38ddc54f4f/be/src/exec/text-converter.cc#L112
> https://github.com/apache/impala/blob/58273fff601dcc763ac43f7cc275a174a2e18b6b/be/src/exec/hdfs-scanner.cc#L342
> It would be much better to use the non-codegend path only for the problematic 
> columns and use the codegend materialization for the rest + always do 
> conjunct  evaluation with codegen.
> The codegend path orders slots based on the conjuncts that use them and 
> evaluates conjuncts when the slots it need becomes available, so if the row 
> is dropped then the rest of the slots do not need to be materialized. A 
> simple solution would be to always do non-codegend slot materialization first 
> so that they are ready if a conjunct needs them. Moving the columns that are 
> not used by conjuncts to the end could be a further optimization.
> This came up during the materialization of BINARY columns, which needs  
> base64 decoding during materialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7923) DecimalValue should be marked as packed

2020-07-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-7923.
---
Resolution: Fixed

> DecimalValue should be marked as packed
> ---
>
> Key: IMPALA-7923
> URL: https://issues.apache.org/jira/browse/IMPALA-7923
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>
> IMPALA-7473 was a symptom of a more general problem that DecimalValue is not 
> guaranteed to be aligned by the Impala runtime, but the class is not marked 
> as packed and, under some circumstances, GCC will emit code for aligned loads 
> to value_ when value_ is an int128. 
> Testing helps confirm that the compiler does not emit the problematic loads 
> in practice, but it would be better to mark the struct as packed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails

2020-07-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9111.
---
Resolution: Fixed

> Sorting 'Decimal16Value's with codegen enabled but codegen optimizations 
> disabled fails
> ---
>
> Key: IMPALA-9111
> URL: https://issues.apache.org/jira/browse/IMPALA-9111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: crash
>
> Starting the Impala cluster with
> {code:java}
> bin/start-impala-cluster.py 
> --impalad_args="-disable_optimization_passes"{code}
>  
> the following query fails and Impala crashes:
>  
> {code:java}
> SELECT d28_1
>  FROM functional.decimal_rtf_tbl ORDER BY d28_1;{code}
>  
> This error happens if the inlining pass in OptimizeModule in 
> be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens 
> with decimals that need to be stored on 16 bytes. Maybe it is some ABI 
> incompatibility with Decimal16Value.
> Stack trace:
> {code:java}
> #0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1 0x7fda6e64002a in __GI_abort () at abort.c:89
> #2 0x7fda71707149 in os::abort(bool) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #3 0x7fda718bad27 in VMError::report_and_die() () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #4 0x7fda71710e4f in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #6 
> #7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, 
> impala::ColumnType const&) ()
> #8 0x7fd9c3438e25 in Compare ()
> #9 0x02a26293 in impala::TupleRowComparator::Compare 
> (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at 
> be/src/util/tuple-row-compare.h:98
> #10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, 
> this=0x1284e480) at be/src/util/tuple-row-compare.h:107
> #11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, 
> rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72
> #12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree 
> (this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at 
> be/src/runtime/sorter-ir.cc:214
> #13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot 
> (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206
> #14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper 
> (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165
> #15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, 
> run=0x13974da0) at be/src/runtime/sorter.cc:755
> #16 0x02a18e27 in impala::Sorter::SortCurrentInputRun 
> (this=0x1284e3c0) at be/src/runtime/sorter.cc:956
> #17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at 
> be/src/runtime/sorter.cc:892
> #18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, 
> state=0x11e652a0) at be/src/exec/sort-node.cc:187
> #19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, 
> state=0x11e652a0) at be/src/exec/sort-node.cc:90
> #20 0x020f289a in impala::FragmentInstanceState::Open 
> (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:348
> #21 0x020ef54c in impala::FragmentInstanceState::Exec 
> (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:84
> #22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, 
> fis=0xe0571e0) at be/src/runtime/query-state.cc:650
> #23 0x02101268 in impala::QueryStateoperator()(void) 
> const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558
> #24 0x02104c7d in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
> at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #25 0x01f04b46 in boost::function0::operator() 
> (this=0x7fd9c3c4bca0) at 
> toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> #26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (Python Exception  'gdb.error'> No type named class std::basic_string std::char_traits, std::allocator >::_Rep.: 
> name=, Python Exception  No type named class 
> std::basic_string, std::allocator 
> >::_Rep.: 
> category=, functor=...

[jira] [Resolved] (IMPALA-5444) Asynchronous code generation

2020-07-16 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-5444.
---
Resolution: Implemented

> Asynchronous code generation
> 
>
> Key: IMPALA-5444
> URL: https://issues.apache.org/jira/browse/IMPALA-5444
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: codegen
>
> Currently, codegen happens during the preparation phase of a query fragment. 
> In other words, the query fragment cannot start running until the code 
> generation is complete. There are queries in which the code generation time 
> is taking a huge amount of time. While we should disable codegen in some exec 
> nodes if we can accurately estimate in the planner that running without 
> codegen will be better off (e.g. number of rows to process is relatively 
> small), we will still pay the price if say the stats is stale or the 
> estimation is off.
> With async codegen, the idea is that we should run the code generation in a 
> separate thread so that codegen is not on the critical path of the query 
> execution. Once codegen completes for a fragment, we can atomically swap the 
> function pointers of compiled functions embedded in the exec nodes. The exec 
> nodes all currently support falling back to interpretation if the codegend 
> functions don't exist anyway (i.e. the pointer to the compiled function is 
> NULL). In some cases, it can occur that the query may run to completion 
> before codegen completes. Once IMPALA-3259 is fixed (if feasible), we should 
> be able to cancel the codegen execution.
> Another thing to note is that we should be able to bound the codegen work to 
> a set of threads in thread pool so as to control the CPU and memory resources 
> consumed by codegen.
> Another potential extension of this decoupling is IMPALA-9660.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2020-07-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-7655.
---
Resolution: Fixed

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.51s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 44.03ms  | 44.03ms  | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> +--++--+--+++--+---+-+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select 
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca26
> ++
> | count(if(l_orderkey is null, 1, null)) |
> ++
> | 0  |
> ++
> Fetched 1 row(s) in 1.01s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 422.07ms | 422.07ms | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> +--++--+--+++--+---+-+
> {noformat}
> It turns out that this is because we don't have good codegen support for 
> ConditionalFunction, and just fall back to emitting a call to the interpreted 
> path: 
> https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
> See CaseExpr for an example of much better codegen support: 
> https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9984) Implement codegen for TupleIsNullPredicate

2020-07-21 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-9984:
-

 Summary: Implement codegen for TupleIsNullPredicate
 Key: IMPALA-9984
 URL: https://issues.apache.org/jira/browse/IMPALA-9984
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


IMPALA-7657 left codegen for TupleIsNullPredicate unimplemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9984) Implement codegen for TupleIsNullPredicate

2020-08-05 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9984.
---
Resolution: Implemented

> Implement codegen for TupleIsNullPredicate
> --
>
> Key: IMPALA-9984
> URL: https://issues.apache.org/jira/browse/IMPALA-9984
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> IMPALA-7657 left codegen for TupleIsNullPredicate unimplemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10078) Proper codegen for KuduPartitionExpr

2020-08-12 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10078:
--

 Summary: Proper codegen for KuduPartitionExpr
 Key: IMPALA-10078
 URL: https://issues.apache.org/jira/browse/IMPALA-10078
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


Implement codegen for KuduPartitionExpr and remove the use of 
GetCodegendComputeFnWrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7658) Proper codegen for HiveUdfCall

2020-09-10 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-7658.
---
Resolution: Implemented

> Proper codegen for HiveUdfCall
> --
>
> Key: IMPALA-7658
> URL: https://issues.apache.org/jira/browse/IMPALA-7658
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: codegen, performance
>
> This function uses GetCodegendComputeFnWrapper() to call the interpreted path 
> but instead we could codegen the Evaluate() function to reduce the overhead. 
> I think this is likely to be a little involved since there's a loop to 
> unroll, so the solution might end up looking like IMPALA-5168



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7656) Remove all uses of GetCodegendComputeFnWrapper()

2020-09-28 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-7656.
---
Resolution: Fixed

> Remove all uses of GetCodegendComputeFnWrapper()
> 
>
> Key: IMPALA-7656
> URL: https://issues.apache.org/jira/browse/IMPALA-7656
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: codegen
>
> We should really get rid of all uses of this function, it was a stopgap to 
> add codegen support to expressions without really doing the work, but its 
> output can be 10x slower than doing it properly, e.g. see IMPALA-7655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10078) Proper codegen for KuduPartitionExpr

2020-09-28 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10078.

Resolution: Implemented

> Proper codegen for KuduPartitionExpr
> 
>
> Key: IMPALA-10078
> URL: https://issues.apache.org/jira/browse/IMPALA-10078
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Implement codegen for KuduPartitionExpr and remove the use of 
> GetCodegendComputeFnWrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10196) Remove LlvmCodeGen::CastPtrToLlvmPtr

2020-09-28 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10196:
--

 Summary: Remove LlvmCodeGen::CastPtrToLlvmPtr
 Key: IMPALA-10196
 URL: https://issues.apache.org/jira/browse/IMPALA-10196
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


LlvmCodeGen::CastPtrToLlvmPtr embeds a pointer that points to data in the 
current process's memory into codegen'd IR code. Our long term goal is to share 
the codegen'd IR among processes working on the same fragment, which is not 
possible if the IR contains pointers pointing to data of a specific process. A 
step in making the IR independent of the process generating it is removing 
LlvmCodeGen::CastPtrToLlvmPtr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10196) Remove LlvmCodeGen::CastPtrToLlvmPtr

2020-09-30 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10196.

Resolution: Fixed

> Remove LlvmCodeGen::CastPtrToLlvmPtr
> 
>
> Key: IMPALA-10196
> URL: https://issues.apache.org/jira/browse/IMPALA-10196
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> LlvmCodeGen::CastPtrToLlvmPtr embeds a pointer that points to data in the 
> current process's memory into codegen'd IR code. Our long term goal is to 
> share the codegen'd IR among processes working on the same fragment, which is 
> not possible if the IR contains pointers pointing to data of a specific 
> process. A step in making the IR independent of the process generating it is 
> removing LlvmCodeGen::CastPtrToLlvmPtr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10332) Add file formats to HdfsScanNode's thrift representation and codegen for those

2020-11-17 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10332:
--

 Summary: Add file formats to HdfsScanNode's thrift representation 
and codegen for those
 Key: IMPALA-10332
 URL: https://issues.apache.org/jira/browse/IMPALA-10332
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Frontend
Reporter: Daniel Becker
Assignee: Daniel Becker


List all file formats that a HdfsScanNode needs to process in any fragment 
instance. It is possible that some file formats will not be needed in all 
fragment instances.

This is a step towards sharing codegen between different impala backends. Using 
the file formats provided in the thrift file, a backend can codegen code for 
file formats that are not needed in its own process but are needed in other 
fragment instances running on other backends, and the resulting binary can be 
shared between multiple backends.

Codegenning for file formats will be done based on the thrift message and not 
on what is needed for the actual backend. This leads to some extra work in case 
a file format is not needed for the current backend and codegen sharing is not 
available (at this point it is not implemented). However, the overall number of 
such cases is low.

Also adding the file formats to the node's explain string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10332) Add file formats to HdfsScanNode's thrift representation and codegen for those

2020-11-24 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10332.

Resolution: Implemented

> Add file formats to HdfsScanNode's thrift representation and codegen for those
> --
>
> Key: IMPALA-10332
> URL: https://issues.apache.org/jira/browse/IMPALA-10332
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> List all file formats that a HdfsScanNode needs to process in any fragment 
> instance. It is possible that some file formats will not be needed in all 
> fragment instances.
> This is a step towards sharing codegen between different impala backends. 
> Using the file formats provided in the thrift file, a backend can codegen 
> code for file formats that are not needed in its own process but are needed 
> in other fragment instances running on other backends, and the resulting 
> binary can be shared between multiple backends.
> Codegenning for file formats will be done based on the thrift message and not 
> on what is needed for the actual backend. This leads to some extra work in 
> case a file format is not needed for the current backend and codegen sharing 
> is not available (at this point it is not implemented). However, the overall 
> number of such cases is low.
> Also adding the file formats to the node's explain string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10371) test_java_udfs crash impalad if result spooling is enabled

2021-02-24 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10371.

Resolution: Fixed

> test_java_udfs crash impalad if result spooling is enabled
> --
>
> Key: IMPALA-10371
> URL: https://issues.apache.org/jira/browse/IMPALA-10371
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Riza Suminto
>Assignee: Daniel Becker
>Priority: Blocker
> Attachments: 46a19881-resolved.txt, hs_err_pid12878.log
>
>
> The following test query from TestUdfExecution::test_java_udfs crash impalad 
> when result spooling is enabled.
> {code:java}
> select throws_exception() from functional.alltypestiny{code}
> The following is a truncated JVM crash log related to the crash
> {code:java}
> ---  T H R E A D  ---Current thread 
> (0x0fb4c000):  JavaThread "Thread-700" [_thread_in_native, id=30853, 
> stack(0x7f79715ff000,0x7f7971dff000)]Stack: 
> [0x7f79715ff000,0x7f7971dff000],  sp=0x7f7971dfa280,  free 
> space=8172k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V  [libjvm.so+0xb6b032]
> V  [libjvm.so+0x4f14bd]
> V  [libjvm.so+0x80fa8f]
> V  [libjvm.so+0x7e0991]
> V  [libjvm.so+0x69fa10]
> j  
> org.apache.impala.TestUdfException.evaluate()Lorg/apache/hadoop/io/BooleanWritable;+9
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x6af9ba]
> V  [libjvm.so+0xa1def8]
> V  [libjvm.so+0xa1f8d5]
> V  [libjvm.so+0x7610f8]  JVM_InvokeMethod+0x128
> J 2286  
> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (0 bytes) @ 0x7f7acb553ced [0x7f7acb553c00+0xed]
> J 6921 C2 
> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (104 bytes) @ 0x7f7acbd1de38 [0x7f7acbd1ddc0+0x78]
> J 3645 C2 org.apache.impala.hive.executor.UdfExecutor.evaluate()V (396 bytes) 
> @ 0x7f7acaf6e894 [0x7f7acaf6e640+0x254]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x6af9ba]
> V  [libjvm.so+0x72c046]
> V  [libjvm.so+0x730523]
> C  0x7f7ab4c5d0d0
> C  [impalad+0x26a2648]  
> impala::ScalarExprEvaluator::GetValue(impala::ScalarExpr const&, 
> impala::TupleRow const*)+0x7a
> C  [impalad+0x26a25cb]  
> impala::ScalarExprEvaluator::GetValue(impala::TupleRow const*)+0x2b
> C  [impalad+0x21f4f78]  
> impala::AsciiQueryResultSet::AddRows(std::vector  std::allocator > const&, impala::RowBatch*, 
> int, int)+0x4c2
> C  [impalad+0x25c5862]  
> impala::BufferedPlanRootSink::GetNext(impala::RuntimeState*, 
> impala::QueryResultSet*, int, bool*, long)+0x70c
> C  [impalad+0x296cf17]  impala::Coordinator::GetNext(impala::QueryResultSet*, 
> int, bool*, long)+0x557
> C  [impalad+0x219f5fe]  impala::ClientRequestState::FetchRowsInternal(int, 
> impala::QueryResultSet*, long)+0x6b2
> C  [impalad+0x219d98e]  impala::ClientRequestState::FetchRows(int, 
> impala::QueryResultSet*, long)+0x46
> C  [impalad+0x21c1d29]  
> impala::ImpalaServer::FetchInternal(impala::TUniqueId, bool, int, 
> beeswax::Results*)+0x717
> C  [impalad+0x21bbde9]  impala::ImpalaServer::fetch(beeswax::Results&, 
> beeswax::QueryHandle const&, bool, int)+0x577
> {code}
> If result spooling is enabled, BufferedPlanRootSink will be used and 
> ScalarExprEvaluation will be called in BufferedPlanRootSink::GetNext, leading 
> to this crash.
> Without result spooling, BlockingPlanRootSink will be used and 
> ScalarExprEvaluation is called in BlockingPlanRootSink::Send. No crash happen 
> when result spooling is disabled.
> Attached is the full JVM crash log and resolved minidump.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10640) Support reading Parquet Bloom filters - most common types

2021-04-07 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10640:
--

 Summary: Support reading Parquet Bloom filters - most common types
 Key: IMPALA-10640
 URL: https://issues.apache.org/jira/browse/IMPALA-10640
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Support reading Parquet Bloom filters for the most common types: integers, 
float, double and Impala strings. Supporting these types is relatively easy in 
comparison to most other types. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10641) Support reading Parquet Bloom filters - missing types

2021-04-07 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10641:
--

 Summary: Support reading Parquet Bloom filters - missing types
 Key: IMPALA-10641
 URL: https://issues.apache.org/jira/browse/IMPALA-10641
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Daniel Becker


This Jira tracks the addition of read support for Parquet Bloom filters for the 
types not dealt with in IMPALA-10640.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10642) Write support for Parquet Bloom filters - most common types

2021-04-07 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10642:
--

 Summary: Write support for Parquet Bloom filters - most common 
types
 Key: IMPALA-10642
 URL: https://issues.apache.org/jira/browse/IMPALA-10642
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Support writing Parquet Bloom filters for the most common types: integers, 
float, double and Impala strings. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10640) Support reading Parquet Bloom filters - most common types

2021-07-14 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10640.

Resolution: Implemented

> Support reading Parquet Bloom filters - most common types
> -
>
> Key: IMPALA-10640
> URL: https://issues.apache.org/jira/browse/IMPALA-10640
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: parquet
>
> Support reading Parquet Bloom filters for the most common types: integers, 
> float, double and Impala strings. Supporting these types is relatively easy 
> in comparison to most other types. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10642) Write support for Parquet Bloom filters - most common types

2021-07-14 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10642.

Resolution: Implemented

> Write support for Parquet Bloom filters - most common types
> ---
>
> Key: IMPALA-10642
> URL: https://issues.apache.org/jira/browse/IMPALA-10642
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Support writing Parquet Bloom filters for the most common types: integers, 
> float, double and Impala strings. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10826) Test failure in TestEventProcessing.test_transactional_insert_events

2021-07-26 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10826:
--

 Summary: Test failure in 
TestEventProcessing.test_transactional_insert_events
 Key: IMPALA-10826
 URL: https://issues.apache.org/jira/browse/IMPALA-10826
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.1
Reporter: Daniel Becker
Assignee: Zoltán Borók-Nagy


The test 
{code:java}
custom_cluster.test_event_processing.TestEventProcessing.test_transactional_insert_events{code}
failed after 3045f585dd64b8d92ba2f126264a0c0e20d4a4dd was merged.

Stack trace:
{code:java}
custom_cluster/test_event_processing.py:99: in test_transactional_insert_events 
self.run_test_insert_events(unique_database, is_transactional=True) 
custom_cluster/test_event_processing.py:139: in run_test_insert_events assert 
data.split('\t') == ['101', '200'] E AttributeError: 'NoneType' object has no 
attribute 'split'{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10851) Codegen for structs

2021-08-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10851:
--

 Summary: Codegen for structs
 Key: IMPALA-10851
 URL: https://issues.apache.org/jira/browse/IMPALA-10851
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Reporter: Daniel Becker


IMPALA-9495 adds support for struct types in SELECT lists but only with codegen 
turned off. We should remove this restriction either by implementing codegen 
for struct types or calling interpreted code from codegen code to handle 
structs. This latter option is still better than turning off codegen completely 
because other parts of the query that do not handle structs could benefit from 
codegen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10929) Optimise memory usage of structs in tuples

2021-09-23 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10929:
--

 Summary: Optimise memory usage of structs in tuples
 Key: IMPALA-10929
 URL: https://issues.apache.org/jira/browse/IMPALA-10929
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Daniel Becker


If we have both a whole struct and one of its members (or a member of a member 
etc.) in the select list, the whole struct and the member are assigned to 
different slots in the tuple. We could use less memory if the member expression 
used the slot within the whole struct instead.

Example:
For the query 
{code:java}
select id, outer_struct from functional_orc_def.complextypes_nested_structs;
{code}
the row size is 64B, while for

{code:java}
select id, outer_struct, outer_struct.inner_struct2 from 
functional_orc_def.complextypes_nested_structs;
{code}
it is 80B, although it should not need more memory.

It is not limited to the select list, it should also work with where clauses 
etc., for example

{code:java}
select id, outer_struct from functional_orc_def.complextypes_nested_structs 
where outer_struct.inner_struct2.i > 1;
{code}
should also have a row size of 64B instead of 68B.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10983) Test  metadata.test_event_processing.TestEventProcessing.test_insert_events fails

2021-10-25 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-10983:
--

 Summary: Test  
metadata.test_event_processing.TestEventProcessing.test_insert_events  fails
 Key: IMPALA-10983
 URL: https://issues.apache.org/jira/browse/IMPALA-10983
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker


Test 
{code:java}
metadata.test_event_processing.TestEventProcessing.test_insert_events {code}
fails:
h3. Error Message
{code:java}
metadata/test_event_processing.py:48: in test_insert_events 
self.run_test_insert_events(unique_database) 
metadata/test_event_processing.py:128: in run_test_insert_events 
EventProcessorUtils.wait_for_event_processing(self) 
util/event_processor_utils.py:61: in wait_for_event_processing within {1} 
seconds".format(current_event_id, timeout)) E Exception: Event processor did 
not sync till last known event id 31772 within 10 seconds{code}
h3. Stacktrace
{code:java}
metadata/test_event_processing.py:48: in test_insert_events 
self.run_test_insert_events(unique_database) 
metadata/test_event_processing.py:128: in run_test_insert_events 
EventProcessorUtils.wait_for_event_processing(self) 
util/event_processor_utils.py:61: in wait_for_event_processing within {1} 
seconds".format(current_event_id, timeout)) E Exception: Event processor did 
not sync till last known event id 31772 within 10 seconds{code}
h3. Standard Error
{code:java}
SET 
client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events;
 -- connecting to: localhost:21000 -- connecting to localhost:21050 with impyla 
-- 2021-10-22 15:28:00,639 INFO MainThread: Closing active operation -- 
connecting to localhost:28000 with impyla -- 2021-10-22 15:28:00,665 INFO 
MainThread: Closing active operation -- connecting to localhost:11050 with 
impyla SET 
client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events;
 SET sync_ddl=False; -- executing against localhost:21000 DROP DATABASE IF 
EXISTS `test_insert_events_4293827b` CASCADE; -- 2021-10-22 15:28:03,902 INFO 
MainThread: Started query e74933cd62ece6eb:63eb6081 SET 
client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events;
 SET sync_ddl=False; -- executing against localhost:21000{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-11011) Impala crashes in OrcStructReader::NumElements()

2021-11-09 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11011:
--

 Summary: Impala crashes in OrcStructReader::NumElements()
 Key: IMPALA-11011
 URL: https://issues.apache.org/jira/browse/IMPALA-11011
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


Running the query
{code:java}
select inner_arr.ITEM
from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as 
inner_arr;{code}
{{in a non-full-acid version/copy of functional_orc_def.complextypestbl crashes 
Impala because in OrcStructReader::NumElements() 'vbatch_' is NULL and we 
dereference it.}}

{{Steps to reproduce:}}
{{1. Use Hive to create a non-full-acid copy of the table:}}
 * Enter the Hive cmd line:

{code:java}
hive beeline -u 'jdbc:hive2://localhost:11050/default'{code}

 * Copy the table with this command:

{code:java}
create table complextypestbl_non_acid stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from complextypestbl;{code}

2.  In Impala, run the query on the copied table:
{code:java}
set disable_codegen=true;
select inner_arr.ITEM
from functional_orc_def.complextypestbl_non_acid.nested_struct.c.d.ITEM as 
inner_arr;{code}
 

Call stack from GDB:
{code:java}
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7fd5e49e9921 in __GI_abort () at abort.c:79
#2  0x7fd5e7929589 in os::abort(bool) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#3  0x7fd5e7b04fb3 in VMError::report_and_die() () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#4  0x7fd5e7933ce4 in JVM_handle_linux_signal () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#5  0x7fd5e79263b8 in signalHandler(int, siginfo_t*, void*) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6  
#7  0x02c3bd7f in impala::OrcStructReader::NumElements (this=0xf043290) 
at be/src/exec/orc-column-readers.h:603
#8  0x02c371b7 in impala::OrcListReader::NumElements (this=0x11009420) 
at be/src/exec/orc-column-readers.cc:563
#9  0x02c371b7 in impala::OrcListReader::NumElements (this=0x11009340) 
at be/src/exec/orc-column-readers.cc:563
#10 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf043200) 
at be/src/exec/orc-column-readers.h:606
#11 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf042ea0) 
at be/src/exec/orc-column-readers.h:606
#12 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf042e10) 
at be/src/exec/orc-column-readers.h:606
#13 0x02c3497f in impala::OrcStructReader::EndOfBatch (this=0xf042e10) 
at be/src/exec/orc-column-readers.cc:294
#14 0x02bf5389 in impala::HdfsOrcScanner::GetNextInternal 
(this=0xeca4000, row_batch=0xf1c95a0) at be/src/exec/hdfs-orc-scanner.cc:648
#15 0x02bf46b7 in impala::HdfsOrcScanner::ProcessSplit (this=0xeca4000) 
at be/src/exec/hdfs-orc-scanner.cc:588
#16 0x02d427ff in impala::HdfsScanNode::ProcessSplit (this=0xff85800, 
filter_ctxs=..., expr_results_pool=0x7fd41a29b4e0, scan_range=0xf2bde00, 
scanner_thread_reservation=0x7fd41a29b408) at be/src/exec/hdfs-scan-node.cc:500
#17 0x02d41b80 in impala::HdfsScanNode::ScannerThread (this=0xff85800, 
first_thread=false, scanner_thread_reservation=16384) at 
be/src/exec/hdfs-scan-node.cc:418
#18 0x02d40ee8 in impala::HdfsScanNodeoperator()(void) 
const (__closure=0x7fd41a29bc08) at be/src/exec/hdfs-scan-node.cc:339
#19 0x02d43afb in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
    at 
/opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
#20 0x022de8ca in boost::function0::operator() 
(this=0x7fd41a29bc00) at 
/opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
#21 0x02aa43a0 in 
impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (name=..., category=..., 
functor=..., parent_thread_info=0x7fd40f8858a0, thread_started=0x7fd40f8846a0) 
at be/src/util/thread.cc:360
#22 0x02aacd01 in 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::__cxx11::basic_string, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, imp

[jira] [Resolved] (IMPALA-11011) Impala crashes in OrcStructReader::NumElements()

2021-11-15 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11011.

Resolution: Fixed

> Impala crashes in OrcStructReader::NumElements()
> 
>
> Key: IMPALA-11011
> URL: https://issues.apache.org/jira/browse/IMPALA-11011
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Running the query
> {code:java}
> select inner_arr.ITEM
> from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as 
> inner_arr;{code}
> {{in a non-full-acid version/copy of functional_orc_def.complextypestbl 
> crashes Impala because in OrcStructReader::NumElements() 'vbatch_' is NULL 
> and we dereference it.}}
> {{Steps to reproduce:}}
> {{1. Use Hive to create a non-full-acid copy of the table:}}
>  * Enter the Hive cmd line:
> {code:java}
> hive beeline -u 'jdbc:hive2://localhost:11050/default'{code}
>  * Copy the table with this command:
> {code:java}
> create table complextypestbl_non_acid stored as orc tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from complextypestbl;{code}
> 2.  In Impala, run the query on the copied table:
> {code:java}
> set disable_codegen=true;
> select inner_arr.ITEM
> from functional_orc_def.complextypestbl_non_acid.nested_struct.c.d.ITEM as 
> inner_arr;{code}
>  
> Call stack from GDB:
> {code:java}
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7fd5e49e9921 in __GI_abort () at abort.c:79
> #2  0x7fd5e7929589 in os::abort(bool) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #3  0x7fd5e7b04fb3 in VMError::report_and_die() () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #4  0x7fd5e7933ce4 in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #5  0x7fd5e79263b8 in signalHandler(int, siginfo_t*, void*) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #6  
> #7  0x02c3bd7f in impala::OrcStructReader::NumElements 
> (this=0xf043290) at be/src/exec/orc-column-readers.h:603
> #8  0x02c371b7 in impala::OrcListReader::NumElements 
> (this=0x11009420) at be/src/exec/orc-column-readers.cc:563
> #9  0x02c371b7 in impala::OrcListReader::NumElements 
> (this=0x11009340) at be/src/exec/orc-column-readers.cc:563
> #10 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf043200) at be/src/exec/orc-column-readers.h:606
> #11 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf042ea0) at be/src/exec/orc-column-readers.h:606
> #12 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf042e10) at be/src/exec/orc-column-readers.h:606
> #13 0x02c3497f in impala::OrcStructReader::EndOfBatch 
> (this=0xf042e10) at be/src/exec/orc-column-readers.cc:294
> #14 0x02bf5389 in impala::HdfsOrcScanner::GetNextInternal 
> (this=0xeca4000, row_batch=0xf1c95a0) at be/src/exec/hdfs-orc-scanner.cc:648
> #15 0x02bf46b7 in impala::HdfsOrcScanner::ProcessSplit 
> (this=0xeca4000) at be/src/exec/hdfs-orc-scanner.cc:588
> #16 0x02d427ff in impala::HdfsScanNode::ProcessSplit (this=0xff85800, 
> filter_ctxs=..., expr_results_pool=0x7fd41a29b4e0, scan_range=0xf2bde00, 
> scanner_thread_reservation=0x7fd41a29b408) at 
> be/src/exec/hdfs-scan-node.cc:500
> #17 0x02d41b80 in impala::HdfsScanNode::ScannerThread 
> (this=0xff85800, first_thread=false, scanner_thread_reservation=16384) at 
> be/src/exec/hdfs-scan-node.cc:418
> #18 0x02d40ee8 in impala::HdfsScanNodeoperator()(void) 
> const (__closure=0x7fd41a29bc08) at be/src/exec/hdfs-scan-node.cc:339
> #19 0x02d43afb in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
>     at 
> /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #20 0x022de8ca in boost::function0::operator() 
> (this=0x7fd41a29bc00) at 
> /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #21 0x02aa43a0 in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., parent_thread_info=0x7fd40f8858a0, 
> thread_started=0x7fd40f8846a0) at be/src/util/thread.cc:360
> #22 0x02aacd01 in 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> s

[jira] [Created] (IMPALA-11059) Speed up zipping unnest by reading collection elements in batches

2021-12-13 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11059:
--

 Summary: Speed up zipping unnest by reading collection elements in 
batches 
 Key: IMPALA-11059
 URL: https://issues.apache.org/jira/browse/IMPALA-11059
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Try to speed up zipping unnest by reading collection elements in batches.

Now we read from the collections row-wise, that is we read one element from 
each collection and we store them in the corresponding columns in the current 
row, then proceed to the next element in each collection for the next row etc. 
The proposal here is to fill the row batch column-wise, i.e. filling the column 
corresponding to the first collection, then the second collection etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11067) Unify struct subexpressions in rows

2022-01-04 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11067:
--

 Summary: Unify struct subexpressions in rows
 Key: IMPALA-11067
 URL: https://issues.apache.org/jira/browse/IMPALA-11067
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Daniel Becker


If a column is given multiple times in the select list, it is not duplicated 
under the hood in the row because we recognise that multiple columns in the 
result reference the same actual column, therefore the row size does not 
increase:

 
{code:java}
explain select id, outer_struct from 
functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct from 
functional_orc_def.complextypes_nested_structs
+---+
| Explain String                                                |
+---+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=64B cardinality=5                                 |
+---+
{code}
With the id column duplicated:

 
{code:java}
explain select id, id, outer_struct from 
functional_orc_def.complextypes_nested_structs;
Query: explain select id, id, outer_struct from 
functional_orc_def.complextypes_nested_structs
+---+
| Explain String                                                |
+---+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=64B cardinality=5                                 |
+---+
{code}

However, if we query a struct and a subfield of the same struct, we do not 
reuse the existing slot in the row but duplicate the subexpression, increasing 
the row size:

 
{code:java}
explain select id, outer_struct, outer_struct.inner_struct2 from 
functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct, outer_struct.inner_struct2 from 
functional_orc_def.complextypes_nested_structs
+---+
| Explain String                                                |
+---+
| Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=80B cardinality=5                                 |
+---+
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11180) Ranger permission error

2022-03-11 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11180:
--

 Summary: Ranger permission error
 Key: IMPALA-11180
 URL: https://issues.apache.org/jira/browse/IMPALA-11180
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Riza Suminto


There are ranger permission errors in the test in some internal builds, 
possibly caused by IMPALA-5256.

 
h3. Stacktrace
{code:java}
authorization/test_authorization.py:158: in test_ranger_show_stmts_with_select 
self._test_ranger_show_stmts_helper(unique_name, ['select']) 
authorization/test_authorization.py:125: in _test_ranger_show_stmts_helper 
.format(priv, unique_name, priv, getuser())) common/impala_connection.py:208: 
in execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:187: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:363: in __execute_query handle = 
self.execute_query_async(query_string, user=user) 
beeswax/impala_beeswax.py:357: in execute_query_async handle = 
self.__do_rpc(lambda: self.imp_service.query(query,)) 
beeswax/impala_beeswax.py:520: in __do_rpc raise 
ImpalaBeeswaxException(self.__build_error_message(b), b) E 
ImpalaBeeswaxException: ImpalaBeeswaxException: E INNER EXCEPTION:  E MESSAGE: InternalException: Error 
granting a privilege in Ranger. Ranger error message: Permission denied.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11186) Assertion fails in TestShowCreateTable.test_show_create_table

2022-03-16 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11186:
--

 Summary: Assertion fails in 
TestShowCreateTable.test_show_create_table
 Key: IMPALA-11186
 URL: https://issues.apache.org/jira/browse/IMPALA-11186
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker


Encountered in internal builds, the test 
*metadata.test_show_create_table.TestShowCreateTable.test_show_create_table* 
fails with the following error:
h3. Error Message
{code:java}
metadata/test_show_create_table.py:64: in test_show_create_table 
unique_database) metadata/test_show_create_table.py:118: in 
__run_show_create_table_test_case self.__compare_result(expected_result, 
create_table_result) metadata/test_show_create_table.py:146: in 
__compare_result assert expected_tbl_props == actual_tbl_props E assert 
{'engine.hive...t': 'parquet'} == {'engine.hive71ac7bb', ...} E Omitting 4 
identical items, use -v to show E Right contains more items: E {'uuid': 
'02004aff-d553-437e-8d2c-8b35f71ac7bb'} E Use -v to get the full diff{code}
h3. Stacktrace
{code:java}
metadata/test_show_create_table.py:64: in test_show_create_table 
unique_database) metadata/test_show_create_table.py:118: in 
__run_show_create_table_test_case self.__compare_result(expected_result, 
create_table_result) metadata/test_show_create_table.py:146: in 
__compare_result assert expected_tbl_props == actual_tbl_props E assert 
{'engine.hive...t': 'parquet'} == {'engine.hive71ac7bb', ...} E Omitting 4 
identical items, use -v to show E Right contains more items: E {'uuid': 
'02004aff-d553-437e-8d2c-8b35f71ac7bb'} E Use -v to get the full diff{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11187) TestKuduOperations.test_read_modes fails

2022-03-16 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11187:
--

 Summary: TestKuduOperations.test_read_modes fails
 Key: IMPALA-11187
 URL: https://issues.apache.org/jira/browse/IMPALA-11187
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker


In an internal build, *query_test.test_kudu.TestKuduOperations.test_read_modes* 
 fails:
h3. Error Message
{code:java}
query_test/test_kudu.py:551: in test_read_modes self._retry_query(cursor, 
"select count(*) from %s" % table_name, [(103,)]) query_test/test_kudu.py:535: 
in _retry_query assert retries < 3, \ E AssertionError: Did not get a correct 
result for select count(*) from test_read_modes_53f93f33.test_read_latest after 
3 retries: [(97,)] E assert 3 < 3{code}
h3. Stacktrace
{code:java}
query_test/test_kudu.py:551: in test_read_modes self._retry_query(cursor, 
"select count(*) from %s" % table_name, [(103,)]) query_test/test_kudu.py:535: 
in _retry_query assert retries < 3, \ E AssertionError: Did not get a correct 
result for select count(*) from test_read_modes_53f93f33.test_read_latest after 
3 retries: [(97,)] E assert 3 < 3{code}
h3. Standard Error
{code:java}
-- 2022-03-16 01:02:23,338 INFO MainThread: Using database 
testkuduoperations_1677_vu0h74 as default SET 
client_identifier=query_test/test_kudu.py::TestKuduOperations::()::test_read_modes;
 SET sync_ddl=False; -- executing against localhost:21000 DROP DATABASE IF 
EXISTS `test_read_modes_53f93f33` CASCADE; -- 2022-03-16 01:02:23,342 INFO 
MainThread: Started query 2e4741a082f8195b:c8427d1b SET 
client_identifier=query_test/test_kudu.py::TestKuduOperations::()::test_read_modes;
 SET sync_ddl=False; -- executing against localhost:21000 CREATE DATABASE 
`test_read_modes_53f93f33`; -- 2022-03-16 01:02:28,252 INFO MainThread: Started 
query c547f10a0d52e296:62a357c6 -- 2022-03-16 01:02:28,305 INFO 
MainThread: Created database "test_read_modes_53f93f33" for test ID 
"query_test/test_kudu.py::TestKuduOperations::()::test_read_modes"{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11188) Could not find artifact org.apache.ozone:ozone-filesystem-hadoop3

2022-03-16 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11188:
--

 Summary: Could not find artifact 
org.apache.ozone:ozone-filesystem-hadoop3
 Key: IMPALA-11188
 URL: https://issues.apache.org/jira/browse/IMPALA-11188
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker


In several internal builds compilation fails with the following error:
[ERROR] Failed to execute goal on project impala-frontend: Could not resolve 
dependencies for project org.apache.impala:impala-frontend:jar:3.4.0-SNAPSHOT: 
Could not find artifact 
org.apache.ozone:ozone-filesystem-hadoop3:jar:1.1.0.7.1.8.0-531 in nexus-repo (
[http://nexus-private.hortonworks.com/nexus/content/groups/public]
) -> [Help 1]*08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] To see the 
full stack trace of the errors, re-run Maven with the -e switch.*08:55:33* 
00:55:33 [ERROR] Re-run Maven using the -X switch to enable full debug 
logging.*08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] For more 
information about the errors and possible solutions, please read the following 
articles:*08:55:33* 00:55:33 [ERROR] [Help 1] 
[http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException]
*08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] After correcting the 
problems, you can resume the build with the command*08:55:33* 00:55:33 [ERROR]  
 mvn  -rf :impala-frontend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11193) Assertion fails in ClientCacheTest.MemLeak

2022-03-17 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11193:
--

 Summary: Assertion fails in ClientCacheTest.MemLeak
 Key: IMPALA-11193
 URL: https://issues.apache.org/jira/browse/IMPALA-11193
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Yida Wu


The test {*}ClientCacheTest.MemLeak{*}, introduced in IMPALA-11176, fails in 
several internal builds.
h3. Error Message
{code:java}
Expected: (mem_before) > (0), actual: 0 vs 0{code}
h3. Stacktrace
{code:java}
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/runtime/client-cache-test.cc:100
Expected: (mem_before) > (0), actual: 0 vs 0{code}
Interestingly it is not the main assert that fails but a "precondition", namely 
EXPECT_GT(mem_before, 0).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-11227) FE OOM in TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props

2022-04-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11227.

Resolution: Fixed

> FE OOM in TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props
> --
>
> Key: IMPALA-11227
> URL: https://issues.apache.org/jira/browse/IMPALA-11227
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.0
>Reporter: Quanlong Huang
>Assignee: Daniel Becker
>Priority: Critical
>
> The huge values clause of the insert statement in 
> TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props could 
> cause FE OOM:
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5524/testReport/query_test.test_parquet_bloom_filter/TestParquetBloomFilter/test_fallback_from_dict_if_no_bloom_tbl_props_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]
> {code:bash}
> query_test/test_parquet_bloom_filter.py:176: in 
> test_fallback_from_dict_if_no_bloom_tbl_props
> False)
> query_test/test_parquet_bloom_filter.py:228: in _create_table_dict_overflow
> self.execute_query(insert_stmt, vector.get_value('exec_option'))
> common/impala_test_suite.py:836: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:868: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:961: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: OutOfMemoryError: GC overhead limit exceeded
> {code}
> impalad.INFO
> {code:java}
> I0404 14:24:30.203562 19115 Frontend.java:1871] 
> 7d4c91ed04f27bc4:d32f7826] Analyzing query: insert into 
> test_fallback_from_dict_if_no_bloom_tbl_props_a60c835b.fallback_from_dict 
> values 
> (0),(2),(4),(6),(8),(10),(12),(14),(16),(18),(20),(22),(24),(26),(28),(30),(32),(34),(36),(38),(40),(42),(44),(46),(48),(50),(52),(54),(56),(58),(60)...
> ...
> I0404 14:25:18.025733 19115 jni-util.cc:286] 
> 7d4c91ed04f27bc4:d32f7826] java.lang.OutOfMemoryError: GC overhead 
> limit exceeded
>         at 
> java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:68)
>         at java.lang.StringBuilder.(StringBuilder.java:89)
>         at 
> org.apache.impala.analysis.SelectListItem.toSql(SelectListItem.java:84)
>         at org.apache.impala.analysis.SelectStmt.toSql(SelectStmt.java:1235)
>         at 
> org.apache.impala.analysis.StatementBase.toSql(StatementBase.java:138)
>         at 
> org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:308)
>         at 
> org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:269)
>         at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:262)
>         at 
> org.apache.impala.analysis.SetOperationStmt$SetOperand.analyze(SetOperationStmt.java:102)
>         at 
> org.apache.impala.analysis.SetOperationStmt.analyzeOperands(SetOperationStmt.java:388)
>         at 
> org.apache.impala.analysis.SetOperationStmt.analyze(SetOperationStmt.java:318)
>         at org.apache.impala.analysis.UnionStmt.analyze(UnionStmt.java:49)
>         at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:306)
>         at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:506)
>         at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2012)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1920)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1744)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code}
> I saw this twice in another ubuntu-16.04-dockerised-tests job:
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5523/testReport/junit/query_test.test_par

[jira] [Created] (IMPALA-11242) Impala cluster doesn't start when building with debug_noopt

2022-04-13 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11242:
--

 Summary: Impala cluster doesn't start when building with 
debug_noopt
 Key: IMPALA-11242
 URL: https://issues.apache.org/jira/browse/IMPALA-11242
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Daniel Becker
Assignee: Daniel Becker


After building Impala with  buildall.sh using the -debug_noopt option, the 
Impala cluster cannot be started:



 
{code:java}
./buildall.sh -debug_noopt
[...]
bin/start-impala-cluster.py 
Traceback (most recent call last):
  File "bin/start-impala-cluster.py", line 166, in 
    KUDU_RPC_TIMEOUT = build_flavor_timeout(0, slow_build_timeout=6)
  File "/home/danielbecker/Impala/tests/common/environ.py", line 416, in 
build_flavor_timeout
    cluster_properties = ImpalaTestClusterProperties.get_instance()
  File "/home/danielbecker/Impala/tests/common/environ.py", line 254, in 
get_instance
ImpalaTestClusterFlagsDetector.detect_using_build_root_or_web_ui(IMPALA_HOME)
  File "/home/danielbecker/Impala/tests/common/environ.py", line 175, in 
detect_using_build_root_or_web_ui
    ImpalaTestClusterFlagsDetector.validate_build_flags(build_type, 
library_link_type)
  File "/home/danielbecker/Impala/tests/common/environ.py", line 196, in 
validate_build_flags
    raise Exception("Unknown build type {0}".format(build_type))
Exception: Unknown build type debug_noopt
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-11242) Impala cluster doesn't start when building with debug_noopt

2022-04-19 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11242.

Resolution: Fixed

> Impala cluster doesn't start when building with debug_noopt
> ---
>
> Key: IMPALA-11242
> URL: https://issues.apache.org/jira/browse/IMPALA-11242
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> After building Impala with  buildall.sh using the -debug_noopt option, the 
> Impala cluster cannot be started:
>  
> {code:java}
> ./buildall.sh -debug_noopt
> [...]
> bin/start-impala-cluster.py 
> Traceback (most recent call last):
>   File "bin/start-impala-cluster.py", line 166, in 
>     KUDU_RPC_TIMEOUT = build_flavor_timeout(0, slow_build_timeout=6)
>   File "/home/danielbecker/Impala/tests/common/environ.py", line 416, in 
> build_flavor_timeout
>     cluster_properties = ImpalaTestClusterProperties.get_instance()
>   File "/home/danielbecker/Impala/tests/common/environ.py", line 254, in 
> get_instance
> ImpalaTestClusterFlagsDetector.detect_using_build_root_or_web_ui(IMPALA_HOME)
>   File "/home/danielbecker/Impala/tests/common/environ.py", line 175, in 
> detect_using_build_root_or_web_ui
>     ImpalaTestClusterFlagsDetector.validate_build_flags(build_type, 
> library_link_type)
>   File "/home/danielbecker/Impala/tests/common/environ.py", line 196, in 
> validate_build_flags
>     raise Exception("Unknown build type {0}".format(build_type))
> Exception: Unknown build type debug_noopt
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-10839) NULL values are displayed on a wrong level for nested structs (ORC)

2022-04-21 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10839.

Resolution: Fixed

> NULL values are displayed on a wrong level for nested structs (ORC)
> ---
>
> Key: IMPALA-10839
> URL: https://issues.apache.org/jira/browse/IMPALA-10839
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Gabor Kaszab
>Assignee: Daniel Becker
>Priority: Major
>  Labels: ORC, complextype, correctness, nested_types, scanner
>
> When querying a non-toplevel nested struct then the NULL values are displayed 
> in an incorrect level. E.g.:
> {code:java}
> select id, outer_struct.inner_struct3 from 
> functional_orc_def.complextypes_nested_structs where id >= 4;
> {code}
> {code:java}
> +++
> | id | outer_struct.inner_struct3 |
> +++
> | 4  | {"s":{"i":null,"s":null}}  |
> | 5  | {"s":null} |
> +++
> {code}
> However, here in the first row the expected would be that 's' is null and not 
> its members and in the second line the result should be 'NULL'.
> For reference see what is returned when querying 'outer_struct' instead of 
> 'outer_struct.inner_struct3':
> {code:java}
> ++---+
> | 4  | 
> {"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}
>  |
> | 5  | 
> {"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}   
> |
> ++---+
> {code}
> Note, this issues is with ORC format.
> After some digging I found that these incorrect null values are already 
> present in the ORC scanner where OrcStructReader reads the rows in 
> ReadValue() and ReadValueBatch() functions.
> As a first step it would be nice to verify that the external ORC reader we 
> use for reading the actual values from the files gives correct results.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (IMPALA-10838) Error when struct returned from WITH() and used in an ORDER BY

2022-05-02 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10838.

Resolution: Fixed

> Error when struct returned from WITH() and used in an ORDER BY
> --
>
> Key: IMPALA-10838
> URL: https://issues.apache.org/jira/browse/IMPALA-10838
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Daniel Becker
>Priority: Major
>  Labels: complextype, nested_types
>
> {code:java}
> with sub as (
> select id, small_struct
> from functional_orc_def.complextypes_structs
> where length(small_struct.s) > 5)
> select sub.id, sub.small_struct from sub order by sub.small_struct.i desc;
> {code}
> The above query results an error when trying to run SlotRef.toThrift()
> {code:java}
> ERROR: IllegalStateException: Illegal reference to non-materialized tuple: 
> debugname=InlineViewRef sub alias=sub tid=2
> {code}
> If I rewrite the query a bit to return the member of the struct from the 
> inline view (WITH()) and use this in the ORDER by then the query succeeds as 
> expected:
> {code:java}
> with sub as (
> select id, small_struct, small_struct.i as si
> from functional_orc_def.complextypes_structs where small_struct.i > 19200)
> select sub.id, sub.small_struct from sub order by sub.si desc;
> {code}
> In SortNode.toThrift() I checked what the sort exprs and the resolved tuple 
> exprs are and I see a difference that could be the cause.
>  In the problematic case:
> {code:java}
> - sort exprs in SortNode:
>   SlotRef{label=small_struct.i, type=INT, id=15}
> - resolved exprs in SortNode: 
>   SlotRef{label=id, path=id, type=INT, id=0} 
>   SlotRef{label=small_struct, path=small_struct, type=STRUCT, 
> id=1} 
>   *SlotRef{label=sub.small_struct.i, path=sub.small_struct.i, type=INT, 
> id=10}*
> {code}
> In the successful case:
> {code:java}
> - sort exprs in SortNode: 
>   SlotRef{label=si, type=INT, id=14}
> - resolved exprs in SortNode: 
>   SlotRef{label=id, path=id, type=INT, id=0} 
>   SlotRef{label=small_struct, path=small_struct, type=STRUCT, 
> id=1} 
>   *SlotRef{label=small_struct.i, path=small_struct.i, type=INT, id=4}*
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (IMPALA-11067) Unify struct subexpressions in rows

2022-05-02 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11067.

Resolution: Fixed

> Unify struct subexpressions in rows
> ---
>
> Key: IMPALA-11067
> URL: https://issues.apache.org/jira/browse/IMPALA-11067
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: complextype, nested_types
>
> If a column is given multiple times in the select list, it is not duplicated 
> under the hood in the row because we recognise that multiple columns in the 
> result reference the same actual column, therefore the row size does not 
> increase:
>  
> {code:java}
> explain select id, outer_struct from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct from 
> functional_orc_def.complextypes_nested_structs
> +---+
> | Explain String                                                |
> +---+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---+
> {code}
> With the id column duplicated:
>  
> {code:java}
> explain select id, id, outer_struct from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, id, outer_struct from 
> functional_orc_def.complextypes_nested_structs
> +---+
> | Explain String                                                |
> +---+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---+
> {code}
> However, if we query a struct and a subfield of the same struct, we do not 
> reuse the existing slot in the row but duplicate the subexpression, 
> increasing the row size:
>  
> {code:java}
> explain select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs
> +---+
> | Explain String                                                |
> +---+
> | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=80B cardinality=5                                 |
> +---+
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (IMPALA-9470) Use Parquet bloom filters

2022-05-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9470.
---
Resolution: Implemented

> Use Parquet bloom filters
> -
>
> Key: IMPALA-9470
> URL: https://issues.apache.org/jira/browse/IMPALA-9470
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: parquet
>
> PARQUET-41 has been closed recently. This means Parquet-MR is capable of 
> writing and reading bloom filters.
> Currently bloom filters are per column chunk entries, i.e. with their help we 
> can filter out entire row groups.
> We already filter row groups in HdfsParquetScanner::NextRowGroup() based on 
> column chunk statistics and dictionaries. Skipping row groups based on bloom 
> filters could be also added to this funciton.
> Impala could also write bloom filters.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (IMPALA-10929) Optimise memory usage of structs in tuples

2022-05-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10929.

Resolution: Duplicate

> Optimise memory usage of structs in tuples
> --
>
> Key: IMPALA-10929
> URL: https://issues.apache.org/jira/browse/IMPALA-10929
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> If we have both a whole struct and one of its members (or a member of a 
> member etc.) in the select list, the whole struct and the member are assigned 
> to different slots in the tuple. We could use less memory if the member 
> expression used the slot within the whole struct instead.
> Example:
> For the query 
> {code:java}
> select id, outer_struct from functional_orc_def.complextypes_nested_structs;
> {code}
> the row size is 64B, while for
> {code:java}
> select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs;
> {code}
> it is 80B, although it should not need more memory.
> It is not limited to the select list, it should also work with where clauses 
> etc., for example
> {code:java}
> select id, outer_struct from functional_orc_def.complextypes_nested_structs 
> where outer_struct.inner_struct2.i > 1;
> {code}
> should also have a row size of 64B instead of 68B.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IMPALA-11365) Dereferencing null pointer in TopNNode

2022-06-16 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11365:
--

 Summary: Dereferencing null pointer in TopNNode
 Key: IMPALA-11365
 URL: https://issues.apache.org/jira/browse/IMPALA-11365
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


In the constructor of TopNNode, if {{pnode.partition_comparator_config_}} is 
NULL, we initialise {{partition_cmp_}} with a NULL pointer. However, when 
initialising {{{}partition_heaps_{}}}, we dereference {{partition_cmp_}} 
because {{ComparatorWrapper}} expects a reference.

This has so far not lead to a crash because in this case the comparator of 
{{partition_heaps_}} is not used, but assigning a NULL pointer to a reference 
is undefined behaviour.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (IMPALA-11365) Dereferencing null pointer in TopNNode

2022-06-22 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11365.

Resolution: Fixed

> Dereferencing null pointer in TopNNode
> --
>
> Key: IMPALA-11365
> URL: https://issues.apache.org/jira/browse/IMPALA-11365
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In the constructor of TopNNode, if {{pnode.partition_comparator_config_}} is 
> NULL, we initialise {{partition_cmp_}} with a NULL pointer. However, when 
> initialising {{{}partition_heaps_{}}}, we dereference {{partition_cmp_}} 
> because {{ComparatorWrapper}} expects a reference.
> This has so far not lead to a crash because in this case the comparator of 
> {{partition_heaps_}} is not used, but assigning a NULL pointer to a reference 
> is undefined behaviour.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IMPALA-11410) Codegen crashes instead of reporting corrupt function

2022-07-01 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11410:
--

 Summary: Codegen crashes instead of reporting corrupt function
 Key: IMPALA-11410
 URL: https://issues.apache.org/jira/browse/IMPALA-11410
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker


In {{FragmentState::CodegenHelper}} we call {{plan_tree_->Codegen(this)}} and 
{{sink_config_->Codegen(this)}} but the status of codegenning is discarded (or 
only used in the profile). If codegen fails because of a bug and the generated 
functions fail verification, {{LlvmCodeGen::is_corrupt_}} is set to true, which 
means all further functions will fail verification too. This can lead to 
{{LlvmCodeGen::GetHashFunction}} returning {{{}NULL{}}}, but in 
{{HashTableCtx::CodegenHashRow}} we dereference this {{NULL}} pointer, causing 
a crash. See 
[https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/exec/hash-table.cc#L1043
 (the 
pointer|https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/exec/hash-table.cc#L1043]
 (the pointer in question is {{{}hash_fn{}}}).

This situation only arises if there is already a bug in code generation, but if 
the codegen bug is in a {{{}ScalarExpr{}}}, for example {{{}SlotRef{}}}, we 
return an error message instead of crashing. See 
{{FragmentState::CodegenHelper}} for how these cases are handled differently.

It would help debugging if we handled these cases uniformly, by returning an 
error message.

Steps to reproduce:
1. Introduce an error in {{FilterContext::CodegenEval}} by deleting a 
{{CreateBr}} call
2. Run the following query:
{code:sql}
select a.outer_struct.inner_struct2.i, b.small_struct.i
from functional_orc_def.complextypes_nested_structs a
inner join functional_orc_def.complextypes_structs b
on b.small_struct.i = a.outer_struct.inner_struct2.i + 19091
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11412) CodegenFnPtr::store() has a compile time error when instantiated

2022-07-04 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11412:
--

 Summary: CodegenFnPtr::store() has a compile time error 
when instantiated
 Key: IMPALA-11412
 URL: https://issues.apache.org/jira/browse/IMPALA-11412
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


The function template {{CodegenFnPtr::store() }}tries to implicitly 
cast a function pointer (of type '{{{}FuncType{}}}') to {{{}void*{}}}, which is 
a compile time error. The reason this didn't come up in the builds is that this 
function template is currently not used anywhere, and the function pointers are 
stored through the parent class, using {{{}CodegenFnPtrBase::store(){}}}, which 
 takes a {{{}void*{}}}.

We should either 
 # remove the hitherto unused {{CodegenFnPtr::store()}} function 
template 
OR
 # add the correct explicit cast from function pointer to {{void*}} AND add a 
test which instantiates (and tests) this function template so we can be sure 
that the new implementation is correct.

I'm inclined to choose the second option because I think the interface of 
{{CodegenFnPtr}} is more complete if we have this function as well, 
even if it is currently not used.

Note:
After digging a bit on the internet I found that the reason that implicit 
function pointer to {{void*}} cast is not allowed (as opposed to implicit 
regular pointer to {{{}void*{}}}) is because the standard doesn't guarantee 
that regular and function pointers have the same size, and there are some 
architectures where they actually don't.

However, according to 8) on 
[https://en.cppreference.com/w/cpp/language/reinterpret_cast|https://en.cppreference.com/w/cpp/language/reinterpret_cast],
 POSIX compliant systems do have this guarantee, so it shouldn't be a problem 
that we store funcion pointers as {{{}void*{}}}. We don't really have a choice 
because LLVM does the same as  
{{llvm::ExecutionEngine::getPointerToFunction()}} returns a {{void*}} (see 
[https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa|https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa]);
 we call the function 
[https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/codegen/llvm-codegen.cc#L1315|here].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children

2022-07-05 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11416:
--

 Summary: SlotRef::tuple_is_nullable_ uninitialised for struct 
children
 Key: IMPALA-11416
 URL: https://issues.apache.org/jira/browse/IMPALA-11416
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the 
{{SlotRef}} is not within a struct:
{code:cpp}
if (!slot_desc_->parent()->isTupleOfStructSlot()) {
  tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_);
}
{code}
https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103

Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined 
behaviour when it is read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children

2022-07-06 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11416.

Resolution: Fixed

> SlotRef::tuple_is_nullable_ uninitialised for struct children
> -
>
> Key: IMPALA-11416
> URL: https://issues.apache.org/jira/browse/IMPALA-11416
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the 
> {{SlotRef}} is not within a struct:
> {code:cpp}
> if (!slot_desc_->parent()->isTupleOfStructSlot()) {
>   tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_);
> }
> {code}
> https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103
> Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined 
> behaviour when it is read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children

2022-07-06 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker closed IMPALA-11416.
--

> SlotRef::tuple_is_nullable_ uninitialised for struct children
> -
>
> Key: IMPALA-11416
> URL: https://issues.apache.org/jira/browse/IMPALA-11416
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the 
> {{SlotRef}} is not within a struct:
> {code:cpp}
> if (!slot_desc_->parent()->isTupleOfStructSlot()) {
>   tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_);
> }
> {code}
> https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103
> Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined 
> behaviour when it is read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11412) CodegenFnPtr::store() has a compile time error when instantiated

2022-07-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11412.

Resolution: Fixed

> CodegenFnPtr::store() has a compile time error when instantiated
> --
>
> Key: IMPALA-11412
> URL: https://issues.apache.org/jira/browse/IMPALA-11412
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The function template {{CodegenFnPtr::store()}} tries to implicitly 
> cast a function pointer (of type {{FuncType}}) to {{void*}}, which is a 
> compile time error. The reason this didn't come up in the builds is that this 
> function template is currently not used anywhere, and the function pointers 
> are stored through the parent class, using {{{}CodegenFnPtrBase::store(){}}}, 
> which  takes a {{{}void*{}}}.
> We should either 
>  # remove the hitherto unused {{CodegenFnPtr::store()}} function 
> template 
> OR
>  # add the correct explicit cast from function pointer to {{void*}} AND add a 
> test which instantiates (and tests) this function template so we can be sure 
> that the new implementation is correct.
> I'm inclined to choose the second option because I think the interface of 
> {{CodegenFnPtr}} is more complete if we have this function as well, 
> even if it is currently not used.
> Note:
> After digging a bit on the internet I found that the reason that implicit 
> function pointer to {{void*}} cast is not allowed (as opposed to implicit 
> regular pointer to {{void*}}) is because the standard doesn't guarantee that 
> regular and function pointers have the same size, and there are some 
> architectures where they actually don't.
> However, according to 8) on 
> [https://en.cppreference.com/w/cpp/language/reinterpret_cast], POSIX 
> compliant systems do have this guarantee, so it shouldn't be a problem that 
> we store function pointers as {{{}void*{}}}. We don't really have a choice 
> because LLVM does the same as  
> {{llvm::ExecutionEngine::getPointerToFunction()}} returns a {{void*}} (see 
> [https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa]);
>  we call that function 
> [here|https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/codegen/llvm-codegen.cc#L1315].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11425) Python TypeError: super() takes at least 1 argument (0 given)

2022-07-13 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11425:
--

 Summary: Python TypeError: super() takes at least 1 argument (0 
given)
 Key: IMPALA-11425
 URL: https://issues.apache.org/jira/browse/IMPALA-11425
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker


The following error happens in various builds during tarball creation:

{code:java}
Traceback (most recent call last):
  File "setup.py", line 167, in 
'Topic :: Database :: Front-Ends'
  File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
  File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
  File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
  File 
"/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/command/sdist.py",
 line 153, in run
self.run_command(cmd_name)
  File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
  File "/usr/lib64/python2.7/distutils/dist.py", line 970, in run_command
cmd_obj = self.get_command_obj(command)
  File "/usr/lib64/python2.7/distutils/dist.py", line 845, in get_command_obj
klass = self.get_command_class(command)
  File "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/dist.py", 
line 410, in get_command_class
return _Distribution.get_command_class(self, command)
  File "/usr/lib64/python2.7/distutils/dist.py", line 815, in get_command_class
__import__ (module_name)
  File "/usr/lib64/python2.7/distutils/command/check.py", line 13, in 
from docutils.utils import Reporter
  File 
"/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", line 
123, in 
release=True  # True for official releases and pre-releases
  File 
"/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", line 
93, in __new__
return super().__new__(cls, major, minor, micro,
TypeError: super() takes at least 1 argument (0 given)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11427) TestOrcStats.test_orc_stats fails

2022-07-13 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11427:
--

 Summary: TestOrcStats.test_orc_stats fails
 Key: IMPALA-11427
 URL: https://issues.apache.org/jira/browse/IMPALA-11427
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


In one of the builds, query_test.test_orc_stats.TestOrcStats.test_orc_stats 
fails:

{code:java}
query_test/test_orc_stats.py:40: in test_orc_stats
self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
common/impala_test_suite.py:820: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:665: in verify_runtime_profile
% (function, field, expected_value, actual_value, op, actual))
E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
results.
E   EXPECTED VALUE:
E   5
E   
E   
E   ACTUAL VALUE:
E   0
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2022-07-14 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11431:
--

 Summary: 
TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
exhaustive build
 Key: IMPALA-11431
 URL: https://issues.apache.org/jira/browse/IMPALA-11431
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


In one of the exhaustive builds, 
query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
 fails:

{code:java}
query_test/test_nested_types.py:252: in test_compute_stats_with_structs
self.run_test_case('QueryTest/compute-stats-with-structs', vector)
common/impala_test_suite.py:778: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:588: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:469: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:278: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
 == 
'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
E 'str','STRING',6,0,11,10.330154,-1,-1 != 
'str','STRING',-1,-1,-1,-1,-1,-1
E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11432) TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup

2022-07-14 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11432:
--

 Summary: TestRanger.test_grant_revoke_with_role fails with impalad 
stuck at startup
 Key: IMPALA-11432
 URL: https://issues.apache.org/jira/browse/IMPALA-11432
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


In one of the exhaustive builds 
authorization.test_ranger.TestRanger.test_grant_revoke_with_role failed with 
one of the impalads stuck during startup:

Stacktrace

{code:java}
common/custom_cluster_test_suite.py:181: in setup_method
self._start_impala_cluster(cluster_args, **kwargs)
common/custom_cluster_test_suite.py:285: in _start_impala_cluster
check_call(cmd + options, close_fds=True)
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/Impala-Toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py:190:
 in check_call
raise CalledProcessError(retcode, cmd)
E   CalledProcessError: Command 
'['/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py',
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
'--num_coordinators=3', 
'--log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests',
 '--log_level=1', '--impalad_args=--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger 
', '--state_store_args=None ', '--catalogd_args=--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger 
', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
{code}

Standard Error

{code:java}
-- 2022-07-14 01:07:04,943 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive 
--ranger_app_id=impala --authorization_provider=ranger 
--use_customized_user_groups_mapper_for_ranger ' '--state_store_args=None ' 
'--catalogd_args=--server-name=server1 --ranger_service_type=hive 
--ranger_app_id=impala --authorization_provider=ranger 
--use_customized_user_groups_mapper_for_ranger ' 
--impalad_args=--default_query_options=
01:07:05 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
01:07:05 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO
01:07:05 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
01:07:05 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO
01:07:05 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
01:07:05 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
01:07:08 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:09 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:10 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:11 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:12 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:13 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:14 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:15 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:16 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:17 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:18 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
01:07:18 MainThread: Error starting cluster
Traceback (most recent call last):
  File 
"/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py",
 line 840, in 
expected_cluster_size - expected_catalog_delays)
  File 
"/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tes

[jira] [Created] (IMPALA-11443) Possible overflow in SortNode.java

2022-07-18 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11443:
--

 Summary: Possible overflow in SortNode.java
 Key: IMPALA-11443
 URL: https://issues.apache.org/jira/browse/IMPALA-11443
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Daniel Becker


At line
https://github.com/apache/impala/blob/d029ae53676c8637bc4f56b80b331920c8289108/fe/src/main/java/org/apache/impala/planner/SortNode.java#L514

the following precondition check was triggered in ResourceProfileBuilder:

{code:java}
Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set");
{code}

See 
https://github.com/apache/impala/blob/d029ae53676c8637bc4f56b80b331920c8289108/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java#L79

This may be due to corrupt statistics but further investigation is necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11345) Query failed when creating equal conjunction map for Parquet bloom filter

2022-08-01 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11345.

Resolution: Fixed

> Query failed when creating equal conjunction map for Parquet bloom filter
> -
>
> Key: IMPALA-11345
> URL: https://issues.apache.org/jira/browse/IMPALA-11345
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 4.1.0
> Environment: CentOS-7, Impala-4.1
>Reporter: Yuchen Fan
>Assignee: Daniel Becker
>Priority: Critical
>
> When querying Hive table was added columns without using 'cascade', Impala 
> will encounter error like "Unable to find SchemaNode for path 
> 'db.table.column' in the schema of file 
> 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file 
> in error log and found that the schema is not compatible with table metadata. 
> Call stack is attached as below. Path and table name is masked: 
> {code:java}
> I0609 18:04:25.970052 115413 status.cc:129] 
> c94d0ab3fdf8f943:320300610002] Unable to find SchemaNode for path 
> 'xxx_db.xxx_table.xxx_column' in the schema of file 
> 'hdfs://xxx_nn/xxx_table_path/00_0'.
>     @           0xea543b  impala::Status::Status()
>     @          0x1e3225c  
> impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap()
>     @          0x1e363ea  impala::HdfsParquetScanner::Open()
>     @          0x19b40d0  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
>     @          0x1b5cbae  impala::HdfsScanNode::ProcessSplit()
>     @          0x1b5e12a  impala::HdfsScanNode::ScannerThread()
>     @          0x1b5e9c6  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x18eafa9  impala::Thread::SuperviseThread()
>     @          0x18ee11a  boost::detail::thread_data<>::run()
>     @          0x2385510  thread_proxy
>     @     0x7fb5b0745162  start_thread
>     @     0x7fb5ad21df6c  __clone{code}
> The error may be relation with 
> [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom 
> filter requires right  hand values of equal conjunction matches with current 
> file schema. The filter will be unavailable if the column does not exist in 
> all parquet files scanned. I think we can disable parquet bloom filter for 
> this single query or scan node when discovered such situation.
> How to reproduce (using impala-shell):
>  # create table parquet_test (id INT) stored as parquet;
>  # insert into parquet_test values (1),(2),(3);
>  # alter table parquet_test add columns (name STRING);
>  # insert into parquet_test values (4, "James");
>  # select * from parquet_test where name in ("Lily");
>  # Error occured.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-10918) Allow map type in SELECT list

2022-09-08 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10918.

Resolution: Implemented

> Allow map type in SELECT list
> -
>
> Key: IMPALA-10918
> URL: https://issues.apache.org/jira/browse/IMPALA-10918
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Daniel Becker
>Priority: Major
>  Labels: complextype
>
> This covers collections: Map
> Expected printout format:
> Map:   {"k1":2,"k2":null}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11427) TestOrcStats.test_orc_stats fails

2022-09-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11427.

Resolution: Done

> TestOrcStats.test_orc_stats fails
> -
>
> Key: IMPALA-11427
> URL: https://issues.apache.org/jira/browse/IMPALA-11427
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
>
> In one of the builds, query_test.test_orc_stats.TestOrcStats.test_orc_stats 
> fails:
> {code:java}
> query_test/test_orc_stats.py:40: in test_orc_stats
> self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:820: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:665: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
> results.
> E   EXPECTED VALUE:
> E   5
> E   
> E   
> E   ACTUAL VALUE:
> E   0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-10753) Incorrect length when multiple CHAR(N) values are inserted

2022-09-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10753.

Resolution: Cannot Reproduce

> Incorrect length when multiple CHAR(N) values are inserted
> --
>
> Key: IMPALA-10753
> URL: https://issues.apache.org/jira/browse/IMPALA-10753
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: correctness, ramp-up
>
> To reproduce:
> {code}
> CREATE TABLE impala_char_insert (s STRING);
> -- all values are CHAR(N) with different N, but all will use the biggest N
> INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" 
> AS CHAR(2))), (CAST("123" AS CHAR(3)));
> SELECT length(s) FROM impala_char_insert;
> results:
> 3
> 3
> 3
> -- inserting the same values in separate INSERTs works correctly
> INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1)));
> INSERT INTO impala_char_insert VALUES (CAST("12" AS CHAR(2)));
> INSERT INTO impala_char_insert VALUES (CAST("123" AS CHAR(3)));
> SELECT length(s) FROM impala_char_insert;
> results:
> 1
> 2
> 3
> -- if one value is not CHAR(N), then the lengths are correct
> INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" 
> AS VARCHAR(2))), (CAST("123" AS CHAR(3)));
> SELECT length(s) FROM impala_char_insert;
> results:
> 1
> 2
> 3
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11432) TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup

2022-09-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11432.

Resolution: Cannot Reproduce

> TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup
> --
>
> Key: IMPALA-11432
> URL: https://issues.apache.org/jira/browse/IMPALA-11432
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
>
> In one of the exhaustive builds 
> authorization.test_ranger.TestRanger.test_grant_revoke_with_role failed with 
> one of the impalads stuck during startup:
> Stacktrace
> {code:java}
> common/custom_cluster_test_suite.py:181: in setup_method
> self._start_impala_cluster(cluster_args, **kwargs)
> common/custom_cluster_test_suite.py:285: in _start_impala_cluster
> check_call(cmd + options, close_fds=True)
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/Impala-Toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command 
> '['/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py',
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
> '--num_coordinators=3', 
> '--log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests',
>  '--log_level=1', '--impalad_args=--server-name=server1 
> --ranger_service_type=hive --ranger_app_id=impala 
> --authorization_provider=ranger 
> --use_customized_user_groups_mapper_for_ranger ', '--state_store_args=None ', 
> '--catalogd_args=--server-name=server1 --ranger_service_type=hive 
> --ranger_app_id=impala --authorization_provider=ranger 
> --use_customized_user_groups_mapper_for_ranger ', 
> '--impalad_args=--default_query_options=']' returned non-zero exit status 1
> {code}
> Standard Error
> {code:java}
> -- 2022-07-14 01:07:04,943 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--server-name=server1 
> --ranger_service_type=hive --ranger_app_id=impala 
> --authorization_provider=ranger 
> --use_customized_user_groups_mapper_for_ranger ' '--state_store_args=None ' 
> '--catalogd_args=--server-name=server1 --ranger_service_type=hive 
> --ranger_app_id=impala --authorization_provider=ranger 
> --use_customized_user_groups_mapper_for_ranger ' 
> --impalad_args=--default_query_options=
> 01:07:05 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 01:07:05 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 01:07:05 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 01:07:05 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 01:07:05 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 01:07:05 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 01:07:08 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:09 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:10 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:11 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:12 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:13 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:14 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:15 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:16 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
> 01:07:17 MainThread: Found 2 impalad/1 statestored/1 catalogd process(e

[jira] [Resolved] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2022-09-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11431.

Resolution: Cannot Reproduce

> TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
> exhaustive build
> 
>
> Key: IMPALA-11431
> URL: https://issues.apache.org/jira/browse/IMPALA-11431
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
>
> In one of the exhaustive builds, 
> query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
>  fails:
> {code:java}
> query_test/test_nested_types.py:252: in test_compute_stats_with_structs
> self.run_test_case('QueryTest/compute-stats-with-structs', vector)
> common/impala_test_suite.py:778: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:588: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
>  == 
> 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
> E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
> E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
> E 'str','STRING',6,0,11,10.330154,-1,-1 != 
> 'str','STRING',-1,-1,-1,-1,-1,-1
> E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11425) Python TypeError: super() takes at least 1 argument (0 given)

2022-09-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11425.

Resolution: Duplicate

> Python TypeError: super() takes at least 1 argument (0 given)
> -
>
> Key: IMPALA-11425
> URL: https://issues.apache.org/jira/browse/IMPALA-11425
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
>
> The following error happens in various builds during tarball creation:
> {code:java}
> Traceback (most recent call last):
>   File "setup.py", line 167, in 
> 'Topic :: Database :: Front-Ends'
>   File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
> dist.run_commands()
>   File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
> self.run_command(cmd)
>   File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
> cmd_obj.run()
>   File 
> "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/command/sdist.py",
>  line 153, in run
> self.run_command(cmd_name)
>   File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
> self.distribution.run_command(command)
>   File "/usr/lib64/python2.7/distutils/dist.py", line 970, in run_command
> cmd_obj = self.get_command_obj(command)
>   File "/usr/lib64/python2.7/distutils/dist.py", line 845, in get_command_obj
> klass = self.get_command_class(command)
>   File 
> "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/dist.py", line 
> 410, in get_command_class
> return _Distribution.get_command_class(self, command)
>   File "/usr/lib64/python2.7/distutils/dist.py", line 815, in 
> get_command_class
> __import__ (module_name)
>   File "/usr/lib64/python2.7/distutils/command/check.py", line 13, in 
> from docutils.utils import Reporter
>   File 
> "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", 
> line 123, in 
> release=True  # True for official releases and pre-releases
>   File 
> "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", 
> line 93, in __new__
> return super().__new__(cls, major, minor, micro,
> TypeError: super() takes at least 1 argument (0 given)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11643) Implement ColumnType::ToIR() for non-scalar types

2022-10-07 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11643:
--

 Summary: Implement ColumnType::ToIR() for non-scalar types
 Key: IMPALA-11643
 URL: https://issues.apache.org/jira/browse/IMPALA-11643
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


Currently ColumnType::ToIR() is only implemented for scalar types. It should be 
extended to support all types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11645) Remove PrintThriftEnum functions in debug-utils.cc

2022-10-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11645:
--

 Summary: Remove PrintThriftEnum functions in debug-utils.cc
 Key: IMPALA-11645
 URL: https://issues.apache.org/jira/browse/IMPALA-11645
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


Before IMPALA-5690 we implemented operator<< for Thrift enums in Impala code. 
These functions printed the names of the enums.

Then we upgraded to Thrift 0.9.3, but that release included THRIFT-2067, which 
implemented operator<< for Thrift enums, but printed the number value of enums 
instead of their names. To preserve the old behaviour in Impala, we renamed our 
own implementations of operator<< to PrintThriftEnum, a function that we 
defined for each Thrift enum we used, and which returned a string with the 
names - not the numbers - of the enums.

After upgrading Thrift to a version that included THRIFT-3921 (any version 
starting from 0.11.0), these PrintThriftEnum functions are no longer necessary 
as the operator<< provided by Thrift now prints the names of enums, which is 
the behaviour we want.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-10356) Analyzed query in explain plan is not quite right for insert with values clause

2022-10-12 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10356.

Resolution: Fixed

> Analyzed query in explain plan is not quite right for insert with values 
> clause
> ---
>
> Key: IMPALA-10356
> URL: https://issues.apache.org/jira/browse/IMPALA-10356
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: newbie, ramp-up
>
> In impala-shell:
> {noformat}
> create table double_tbl (d double) stored as textfile;
> set explain_level=2;
> explain insert into double_tbl values (-0.43149576573887316);
> {noformat}
> {noformat}
> +--+
> | Explain String  
>  |
> +--+
> | Max Per-Host Resource Reservation: Memory=0B Threads=1  
>  |
> | Per-Host Resource Estimates: Memory=10MB
>  |
> | Codegen disabled by planner 
>  |
> | Analyzed query: SELECT CAST(-0.43149576573887316 AS DECIMAL(17,17)) UNION 
> SELECT |
> | CAST(-0.43149576573887316 AS DECIMAL(17,17))
>  |
> | 
>  |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>  |
> | |  Per-Host Resources: mem-estimate=8B mem-reservation=0B 
> thread-reservation=1   |
> | WRITE TO HDFS [default.double_tbl, OVERWRITE=false] 
>  |
> | |  partitions=1 
>  |
> | |  output exprs: CAST(-0.43149576573887316 AS DOUBLE)   
>  |
> | |  mem-estimate=8B mem-reservation=0B thread-reservation=0  
>  |
> | |   
>  |
> | 00:UNION
>  |
> |constant-operands=1  
>  |
> |mem-estimate=0B mem-reservation=0B thread-reservation=0  
>  |
> |tuple-ids=0 row-size=8B cardinality=1
>  |
> |in pipelines:  
>  |
> +--+
> {noformat}
> The analyzed query does not make sense. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11623) Put *-ir.cc files into their own libraries to avoid extra recompilation

2022-10-18 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11623.

Resolution: Implemented

> Put *-ir.cc files into their own libraries to avoid extra recompilation
> ---
>
> Key: IMPALA-11623
> URL: https://issues.apache.org/jira/browse/IMPALA-11623
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Daniel Becker
>Priority: Major
>
> It is desirable to be able to iterate quickly by running "make -j impalad" 
> while modifying a file. Currently, modifying most files incurs a rebuild of 
> the LLVM IR, which is a slow serial step. For example:
>  
> {noformat}
> $ touch be/src/runtime/coordinator.cc
> $ make -j impalad
> ...
> [ 98%] Generating ../../../llvm-ir/impala.bc
> [ 98%] Generating ../../../llvm-ir/impala-legacy-avx.bc
> [ 98%] Generating ../../generated-sources/impala-ir/impala-ir.cc
> [ 98%] Generating ../../generated-sources/impala-ir/impala-ir-legacy-avx.cc
> ...{noformat}
> This can add several seconds to an incremental build. This step happens for 
> files that do not actually impact the LLVM IR, so there are ways to avoid 
> this.
> The reason that LLVM IR is rebuilt is because it has a dependencies on Exec, 
> Exprs, Runtime, Udf, Util, and other libraries:
>  
> {noformat}
> add_custom_command(
>   OUTPUT ${IR_OUTPUT_FILE}
>   COMMAND ${LLVM_CLANG_EXECUTABLE} ${CLANG_IR_CXX_FLAGS} 
> ${PLATFORM_SPECIFIC_FLAGS}
>           ${CLANG_INCLUDE_FLAGS} ${IR_INPUT_FILES} -o ${IR_TMP_OUTPUT_FILE}
>   COMMAND ${LLVM_OPT_EXECUTABLE} ${LLVM_OPT_IR_FLAGS} < ${IR_TMP_OUTPUT_FILE} 
> > ${IR_OUTPUT_FILE}
>   COMMAND rm ${IR_TMP_OUTPUT_FILE}
>   DEPENDS Exec ExecAvro ExecKudu Exprs Runtime Udf Util ${IR_INPUT_FILES}
> ){noformat}
> From a correctness perspective, the LLVM IR only cares about things that 
> impact the content of the *-ir.cc files, because impala-ir.cc includes every 
> *-ir.cc file. That list of libraries is a superset of what is needed.
> If the *-ir.cc files were split off into their own libraries (i.e. ExecIr 
> rather than Exec), then this target would only depend on the ExecIr rather 
> than the larger Exec. This would reduce the number of files that would cause 
> LLVM IR to be rebuilt. That should reduce the runtime of an incremental "make 
> -j impalad" for quite a few C++ files.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11581) ALTER TABLE RENAME TO doesn't update transient_lastDdlTime

2022-10-19 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11581.

Resolution: Fixed

> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime
> --
>
> Key: IMPALA-11581
> URL: https://issues.apache.org/jira/browse/IMPALA-11581
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: ramp-up
>
> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime.
> The following statements behave differently when executed via Hive or Impala:
> {noformat}
> CREATE TABLE rename_from (i int);
> ALTER TABLE rename_from RENAME TO rename_to;
> {noformat}
> During ALTER TABLE ... RENAME TO ... Hive updates transient_lastDdlTime while 
> Impala leaves it unchanged.
> Impala should follow Hive's behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11645) Remove PrintThriftEnum functions in debug-utils.cc

2022-10-20 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11645.

Resolution: Implemented

> Remove PrintThriftEnum functions in debug-utils.cc
> --
>
> Key: IMPALA-11645
> URL: https://issues.apache.org/jira/browse/IMPALA-11645
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Before IMPALA-5690 we implemented operator<< for Thrift enums in Impala code. 
> These functions printed the names of the enums.
> Then we upgraded to Thrift 0.9.3, but that release included THRIFT-2067, 
> which implemented operator<< for Thrift enums, but printed the number value 
> of enums instead of their names. To preserve the old behaviour in Impala, we 
> renamed our own implementations of operator<< to PrintThriftEnum, a function 
> that we defined for each Thrift enum we used, and which returned a string 
> with the names - not the numbers - of the enums.
> After upgrading Thrift to a version that included THRIFT-3921 (any version 
> starting from 0.11.0), these PrintThriftEnum functions are no longer 
> necessary as the operator<< provided by Thrift now prints the names of enums, 
> which is the behaviour we want.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11462) shiftleft problem

2022-10-24 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11462.

Resolution: Fixed

> shiftleft problem
> -
>
> Key: IMPALA-11462
> URL: https://issues.apache.org/jira/browse/IMPALA-11462
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 3.4.1
>Reporter: jack sun
>Assignee: Daniel Becker
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> if change the second param of function 'shiftleft' as a dynamic value , it 
> will change the first param as tinnyint
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11685) Slot memory sharing between struct and field not working if the field is also a struct

2022-10-25 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11685:
--

 Summary: Slot memory sharing between struct and field not working 
if the field is also a struct
 Key: IMPALA-11685
 URL: https://issues.apache.org/jira/browse/IMPALA-11685
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Daniel Becker
Assignee: Daniel Becker


IMPALA-10838 introduced that if a struct and one of its fields are both present 
in the select list, no extra slot is generated in the row for the struct field 
but the memory of the struct is reused, i.e. the row size is the same as when 
only the struct is queried. It works when the struct field is a primitive type:
{code:java}
explain select id, outer_struct from 
functional_orc_def.complextypes_nested_structs;
row-size=64B{code}
{code:java}
explain select id, outer_struct, outer_struct.str from 
functional_orc_def.complextypes_nested_structs;
row-size=64B{code}
However, it does not if the child is itself a struct:
{code:java}
explain select id, outer_struct, outer_struct.inner_struct3 from 
functional_orc_def.complextypes_nested_structs;
row-size=80B{code}
This is because struct slot descriptors are registered before others so that it 
is easier to reuse the slot memory of the struct fields, but struct slot 
descriptors among themselves are sorted in the wrong order (see 
[https://github.com/apache/impala/blob/c12ac6c27b2df1eae693b44c157d65499f491d21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L340).]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11687) Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails

2022-10-27 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11687:
--

 Summary: Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex 
types fails
 Key: IMPALA-11687
 URL: https://issues.apache.org/jira/browse/IMPALA-11687
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Daniel Becker
Assignee: Daniel Becker


If EXPAND_COMPLEX_TYPES is set to true, some queries that combine star 
expressions and explicitly given complex columns fail:
{code:java}
select outer_struct, * from functional_orc_def.complextypes_nested_structs;
ERROR: IllegalStateException: Illegal reference to non-materialized slot: tid=1 
sid=1{code}
{code:java}
select *, outer_struct.str from functional_orc_def.complextypes_nested_structs;
ERROR: IllegalStateException: null{code}
Having two stars in a table with complex columns also fails.
{code:java}
select *, * from functional_orc_def.complextypes_nested_structs;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11685) Slot memory sharing between struct and field not working if the field is also a struct

2022-10-27 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11685.

Resolution: Fixed

> Slot memory sharing between struct and field not working if the field is also 
> a struct
> --
>
> Key: IMPALA-11685
> URL: https://issues.apache.org/jira/browse/IMPALA-11685
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> IMPALA-10838 introduced that if a struct and one of its fields are both 
> present in the select list, no extra slot is generated in the row for the 
> struct field but the memory of the struct is reused, i.e. the row size is the 
> same as when only the struct is queried. It works when the struct field is a 
> primitive type:
> {code:java}
> explain select id, outer_struct from 
> functional_orc_def.complextypes_nested_structs;
> row-size=64B{code}
> {code:java}
> explain select id, outer_struct, outer_struct.str from 
> functional_orc_def.complextypes_nested_structs;
> row-size=64B{code}
> However, it does not if the child is itself a struct:
> {code:java}
> explain select id, outer_struct, outer_struct.inner_struct3 from 
> functional_orc_def.complextypes_nested_structs;
> row-size=80B{code}
> This is because struct slot descriptors are registered before others so that 
> it is easier to reuse the slot memory of the struct fields, but struct slot 
> descriptors among themselves are sorted in the wrong order (see 
> [https://github.com/apache/impala/blob/c12ac6c27b2df1eae693b44c157d65499f491d21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L340).]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11692) Struct slot memory sharing involving select * not working properly

2022-10-28 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11692:
--

 Summary: Struct slot memory sharing involving select * not working 
properly 
 Key: IMPALA-11692
 URL: https://issues.apache.org/jira/browse/IMPALA-11692
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


With EXPAND_COMPLEX_TYPES=1, if there are structs coming from the star 
expansion and members of the structs are also given explicitly, slot memory 
sharing does not work in some cases:
{code:java}
explain select * from functional_orc_def.complextypes_nested_structs;
row-size=64B{code}
{code:java}
explain select *, outer_struct.inner_struct1 from 
functional_orc_def.complextypes_nested_structs;
row-size=80B{code}
The row size should be the same in both cases as outer_struct.inner_struct1 is 
part of outer_struct which is included in the star.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11687) Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails

2022-11-04 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11687.

Resolution: Fixed

> Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails
> -
>
> Key: IMPALA-11687
> URL: https://issues.apache.org/jira/browse/IMPALA-11687
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> If EXPAND_COMPLEX_TYPES is set to true, some queries that combine star 
> expressions and explicitly given complex columns fail:
> {code:java}
> select outer_struct, * from functional_orc_def.complextypes_nested_structs;
> ERROR: IllegalStateException: Illegal reference to non-materialized slot: 
> tid=1 sid=1{code}
> {code:java}
> select *, outer_struct.str from 
> functional_orc_def.complextypes_nested_structs;
> ERROR: IllegalStateException: null{code}
> Having two stars in a table with complex columns also fails.
> {code:java}
> select *, * from functional_orc_def.complextypes_nested_structs;
> ERROR: IllegalStateException: Illegal reference to non-materialized slot: 
> tid=6 sid=13{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11712) Sort out column masking with complex types

2022-11-08 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11712:
--

 Summary: Sort out column masking with complex types
 Key: IMPALA-11712
 URL: https://issues.apache.org/jira/browse/IMPALA-11712
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Daniel Becker


We determine whether a SlotDescriptor created from a star expanded path should 
be registered for column masking based on the path of the star item:

??Empty matched types means this is expanded from star of a catalog table.??
??For star of complex types, e.g. my_struct.*, my_array.*, my_map.*, the 
matched??
??types will have the complex type so it's not empty.??

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L659]

However, this comment may be wrong because in the query                         
                 
{code:java}
select a.* from mix_struct_array t, t.struct_in_arr a;{code}
{{getMatchedTypes()}} returns an empty list for the star path even though it is 
not from a catalog table.

We should also find out whether we can determine from the expanded path alone 
(and not the path of the star item) whether we need to register it for column 
masking, for example by checking if it is within a complex type.                
                                                 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11692) Struct slot memory sharing involving select * not working properly

2022-11-09 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11692.

Resolution: Fixed

> Struct slot memory sharing involving select * not working properly 
> ---
>
> Key: IMPALA-11692
> URL: https://issues.apache.org/jira/browse/IMPALA-11692
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> With EXPAND_COMPLEX_TYPES=1, if there are structs coming from the star 
> expansion and members of the structs are also given explicitly, slot memory 
> sharing does not work in some cases:
> {code:java}
> explain select * from functional_orc_def.complextypes_nested_structs;
> row-size=64B{code}
> {code:java}
> explain select *, outer_struct.inner_struct1 from 
> functional_orc_def.complextypes_nested_structs;
> row-size=80B{code}
> The row size should be the same in both cases as outer_struct.inner_struct1 
> is part of outer_struct which is included in the star.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11717) Use rapidjson for printing collections

2022-11-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11717:
--

 Summary: Use rapidjson for printing collections
 Key: IMPALA-11717
 URL: https://issues.apache.org/jira/browse/IMPALA-11717
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


We use rapidjson to print structs but don't use it to print collections (arrays 
and maps). We should switch to rapidjson also for collections to have a uniform 
approach.

This is also needed if we want to support embedding structs and collections in 
each other, see [IMPALA-9551|https://issues.apache.org/jira/browse/IMPALA-9551].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11719) Inconsistency in printing NULL values

2022-11-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11719:
--

 Summary: Inconsistency in printing NULL values
 Key: IMPALA-11719
 URL: https://issues.apache.org/jira/browse/IMPALA-11719
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker


If they are top level or in collections, null values are printed as "NULL":
{code:java}
select int_array from functional_parquet.complextypestbl;
++
| int_array              |
++
| [-1]                   |
| [1,2,3]                |
| [NULL,1,2,NULL,3,NULL] |
| []                     |
| NULL                   |
| NULL                   |
| NULL                   |
| NULL                   |
++{code}
If they are in a struct, they are printed as "null":
{code:java}
select small_struct from functional_parquet.complextypes_structs;
++
| small_struct                       |
++
| NULL                               |
| {"i":19191,"s":"small_struct_str"} |
| {"i":98765,"s":null}               |
| {"i":null,"s":"str"}               |
| {"i":98765,"s":"abcde f"}          |
| {"i":null,"s":null}                |
++{code}
In Hive the situation is a bit different: "NULL" is used only for top level 
values and "null" is printed in both collections and structs.
{code:java}
select int_array from functional_parquet.complextypestbl;
+-+
|        int_array        |
+-+
| [-1]                    |
| [1,2,3]                 |
| [null,1,2,null,3,null]  |
| []                      |
| NULL                    |
| NULL                    |
| NULL                    |
| NULL                    |
+-+{code}
{code:java}
select small_struct from functional_parquet.complextypes_structs;
+-+
|            small_struct             |
+-+
| NULL                                |
| {"i":19191,"s":"small_struct_str"}  |
| {"i":98765,"s":null}                |
| {"i":null,"s":"str"}                |
| {"i":98765,"s":"abcde f"}           |
| {"i":null,"s":null}                 |
+-+{code}
In JSON the relevant keyword is "null".

We should decide how we handle this situation.
 # Have a uniform NULL representation everywhere: top level, collections and 
structs
 ** either "NULL" or "null" everywhere
 # Have "NULL" on the top level and "null" in collections and structs, like Hive
 # Leave everything as it is now: "NULL" at the top level and in collections, 
"null" in structs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11722) Wrong error message when unsupported complex type comes from * expression

2022-11-11 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11722:
--

 Summary: Wrong error message when unsupported complex type comes 
from * expression
 Key: IMPALA-11722
 URL: https://issues.apache.org/jira/browse/IMPALA-11722
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Daniel Becker


The following query fails with a NullPointerException:

 
{code:java}
select * from functional_orc_def.complextypestbl;
ERROR: NullPointerException: null
{code}
 

The table contains a struct, {{{}nested_struct{}}}, which is not supported yet 
because it contains collections. If the columns are listed explicitly, the 
error message is the correct one:

{code:java}
select id, int_array, int_array_array, int_map, int_map_array, nested_struct 
from functional_orc_def.complextypestbl;
ERROR: AnalysisException: Struct containing a collection type is not allowed in 
the select list.{code}
The same error message should be returned in the select * case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11719) Inconsistency in printing NULL values

2022-11-15 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11719.

Resolution: Fixed

> Inconsistency in printing NULL values
> -
>
> Key: IMPALA-11719
> URL: https://issues.apache.org/jira/browse/IMPALA-11719
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> If they are top level or in collections, null values are printed as "NULL":
> {code:java}
> select int_array from functional_parquet.complextypestbl;
> ++
> | int_array              |
> ++
> | [-1]                   |
> | [1,2,3]                |
> | [NULL,1,2,NULL,3,NULL] |
> | []                     |
> | NULL                   |
> | NULL                   |
> | NULL                   |
> | NULL                   |
> ++{code}
> If they are in a struct, they are printed as "null":
> {code:java}
> select small_struct from functional_parquet.complextypes_structs;
> ++
> | small_struct                       |
> ++
> | NULL                               |
> | {"i":19191,"s":"small_struct_str"} |
> | {"i":98765,"s":null}               |
> | {"i":null,"s":"str"}               |
> | {"i":98765,"s":"abcde f"}          |
> | {"i":null,"s":null}                |
> ++{code}
> In Hive the situation is a bit different: "NULL" is used only for top level 
> values and "null" is printed in both collections and structs.
> {code:java}
> select int_array from functional_parquet.complextypestbl;
> +-+
> |        int_array        |
> +-+
> | [-1]                    |
> | [1,2,3]                 |
> | [null,1,2,null,3,null]  |
> | []                      |
> | NULL                    |
> | NULL                    |
> | NULL                    |
> | NULL                    |
> +-+{code}
> {code:java}
> select small_struct from functional_parquet.complextypes_structs;
> +-+
> |            small_struct             |
> +-+
> | NULL                                |
> | {"i":19191,"s":"small_struct_str"}  |
> | {"i":98765,"s":null}                |
> | {"i":null,"s":"str"}                |
> | {"i":98765,"s":"abcde f"}           |
> | {"i":null,"s":null}                 |
> +-+{code}
> Officially we print collections and structs in JSON form. In JSON the 
> relevant keyword is "null".
> We should decide how we handle this situation.
>  # Have a uniform NULL representation everywhere: top level, collections and 
> structs
>  ** either "NULL" or "null" everywhere
>  # Have "NULL" on the top level and "null" in collections and structs, like 
> Hive
>  # Leave everything as it is now: "NULL" at the top level and in collections, 
> "null" in structs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11734) TestIcebergTable.test_compute_stats fails in RELEASE builds

2022-11-21 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-11734:
--

 Summary: TestIcebergTable.test_compute_stats fails in RELEASE 
builds
 Key: IMPALA-11734
 URL: https://issues.apache.org/jira/browse/IMPALA-11734
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


If the Impala version is set to a release build as described in point 8 in the 
"How to Release" document 
([https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate),]
 TestIcebergTable.test_compute_stats fails:
h3. Stacktrace
{code:java}
query_test/test_iceberg.py:852: in test_compute_stats 
self.run_test_case('QueryTest/iceberg-compute-stats', vector, unique_database) 
common/impala_test_suite.py:742: in run_test_case 
self.__verify_results_and_errors(vector, test_section, result, use_db) 
common/impala_test_suite.py:578: in __verify_results_and_errors 
replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
common/test_result_verifier.py:278: in verify_query_result_is_equal assert 
expected_results == actual_results E assert Comparing QueryTestResults 
(expected vs actual): E 2,1,'2.33KB','NOT CACHED','NOT 
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'
 != 2,1,'2.32KB','NOT CACHED','NOT 
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'{code}
The problem is the file size which is 2.32KB instead of 2.33KB. This is because 
the version is written into the file, and "x.y.z-RELEASE" is one byte shorter 
than "x.y.z-SNAPSHOT". The size of the file in this test is on the boundary 
between 2.32KB and 2.33KB, so this one byte can change the value.

We could use a row_regex to accept both values so it works for both snapshot 
and release versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >