[jira] [Resolved] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()
[ https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8381. --- Resolution: Done Implemented and merged. > Remove branch from ParquetPlainEncoder::Decode() > > > Key: IMPALA-8381 > URL: https://issues.apache.org/jira/browse/IMPALA-8381 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Daniel Becker >Priority: Minor > Labels: newbie, parquet, performance, ramp-up > > Removing the "if" at > https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203 > can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For > primitive types, the same check can be done for a whole batch, so the speedup > can be gained for large batches without loosing safety. The only Parquet type > where this check is needed per element is BYTE_ARRAY (typically used for > STRING columns), which already has a template specialization for > ParquetPlainEncoder::Decode(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8467) ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds
[ https://issues.apache.org/jira/browse/IMPALA-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8467. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds > -- > > Key: IMPALA-8467 > URL: https://issues.apache.org/jira/browse/IMPALA-8467 > Project: IMPALA > Issue Type: Bug >Reporter: Laszlo Gaal >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.3.0 > > > This is an example of the logged failures: > {code:java} > 00:57:35.147 15/106 Test #15: parquet-plain-test ...***Failed > 0.48 sec > 00:57:35.147 [==] Running 4 tests from 1 test case. > 00:57:35.147 [--] Global test environment set-up. > 00:57:35.148 [--] 4 tests from PlainEncoding > 00:57:35.148 [ RUN ] PlainEncoding.Basic > 00:57:35.148 = > 00:57:35.148 ==1922==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow > on address 0x7ffe328ee44c at pc 0x017c07bc bp 0x7ffe328ee2f0 sp > 0x7ffe328edaa0 > 00:57:35.148 READ of size 16 at 0x7ffe328ee44c thread T0 > 00:57:35.148 #0 0x17c07bb in __asan_memcpy > /mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:466 > 00:57:35.149 #1 0x1837a26 in void > impala::ParquetPlainEncoder::DecodeNoBoundsCheck (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:332:3 > 00:57:35.149 #2 0x1837a26 in int > impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:223 > 00:57:35.150 #3 0x1837216 in void > impala::TestTypeWidening (parquet::Type::type)3>(impala::TimestampValue const&, int) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:115:22 > 00:57:35.150 #4 0x18122f7 in impala::PlainEncoding_Basic_Test::TestBody() > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:155:3 > 00:57:35.151 #5 0x4fa6142 in void > testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4fa6142) > 00:57:35.151 #6 0x4f9d909 in testing::Test::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9d909) > 00:57:35.152 #7 0x4f9da57 in testing::TestInfo::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9da57) > 00:57:35.152 #8 0x4f9db34 in testing::TestCase::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9db34) > 00:57:35.153 #9 0x4f9edb7 in testing::internal::UnitTestImpl::RunAllTests() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9edb7) > 00:57:35.153 #10 0x4f9f092 in testing::UnitTest::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9f092) > 00:57:35.153 #11 0x181655f in main > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:491:1 > 00:57:35.154 #12 0x7ff7a10b2c04 in __libc_start_main > (/lib64/libc.so.6+0x21c04) > 00:57:35.154 #13 0x17069d6 in _start > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x17069d6) > 00:57:35.154 > 00:57:35.154 Address 0x7ffe328ee44c is located in stack of thread T0 at > offset 332 in frame > 00:57:35.154 #0 0x18378df in int > impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:208 > 00:57:35.155 > 00:57:35.155 This frame has 4 object(s): > 00:57:35.155 [32, 40) 'ref.tmp.i' (line 327) > 00:57:35.155 [64, 68) 'ref.tmp2.i' (line 327) > 00:57:35.155 [80, 96) 'ref.tmp5.i' (line 327) > 00:57:35.155 [112, 120) 'ref.tmp6.i' (line 327) <== Memory access at offset > 332 overflows this variable > 00:57:35.155 HINT: this may be a false positive if your program uses some
[jira] [Created] (IMPALA-8710) Increase allowed bit width to 64 for bit packing
Daniel Becker created IMPALA-8710: - Summary: Increase allowed bit width to 64 for bit packing Key: IMPALA-8710 URL: https://issues.apache.org/jira/browse/IMPALA-8710 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Increase the allowed bit width for bit packing and bit unpacking to 64 bits. This is needed to support Parquet Delta Encoding. Also add new methods to BitWriter and BatchedBitReader handling Uleb and ZigZag integers for 64 bits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8726) Autovectorisation leads to worse performance in bit unpacking
Daniel Becker created IMPALA-8726: - Summary: Autovectorisation leads to worse performance in bit unpacking Key: IMPALA-8726 URL: https://issues.apache.org/jira/browse/IMPALA-8726 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Attachments: no_vector.png The compiler (GCC 4.9.2) autovectorises bit unpacking for bit widths 1, 2, 4 and 8 (function BitPacking::UnpackValues), but this leads to actually worse performance (see the attached graph). We should consider whether it is worth disabling autovectorisation for bit unpacking, but future compiler versions may do a better job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8710) Increase allowed bit width to 64 for bit packing
[ https://issues.apache.org/jira/browse/IMPALA-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8710. --- Resolution: Implemented > Increase allowed bit width to 64 for bit packing > > > Key: IMPALA-8710 > URL: https://issues.apache.org/jira/browse/IMPALA-8710 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Increase the allowed bit width for bit packing and bit unpacking to 64 > bits. This is needed to support Parquet Delta Encoding. > > Also add new methods to BitWriter and BatchedBitReader handling Uleb and > ZigZag integers for 64 bits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8741) Speed up bit unpacking by vectorisation
Daniel Becker created IMPALA-8741: - Summary: Speed up bit unpacking by vectorisation Key: IMPALA-8741 URL: https://issues.apache.org/jira/browse/IMPALA-8741 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Using compiler intrinsics for SIMD and bit manipulation instructions (AVX, AVX2 and BMI2), we can speed up bit unpacking by a factor of about 2 to 8 depending on bit width, at most 16. We need to take care to check that the required instructions are supported by the CPU the impalad is running on and fall back to the scalar implementation if not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8796) Add unit tests to UnpackAndDecodeValues
Daniel Becker created IMPALA-8796: - Summary: Add unit tests to UnpackAndDecodeValues Key: IMPALA-8796 URL: https://issues.apache.org/jira/browse/IMPALA-8796 Project: IMPALA Issue Type: Test Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker BitPacking::UnpackAndDecodeValues has no unit tests in bit-packing-test.cc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8833) Check failed: bit_width <= sizeof(T) * 8 (40 vs. 32) in BatchedBitReader::UnpackBatch()
[ https://issues.apache.org/jira/browse/IMPALA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8833. --- Resolution: Fixed > Check failed: bit_width <= sizeof(T) * 8 (40 vs. 32) in > BatchedBitReader::UnpackBatch() > > > Key: IMPALA-8833 > URL: https://issues.apache.org/jira/browse/IMPALA-8833 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build, crash, flaky > > {noformat} > F0801 21:24:10.571285 15993 bit-stream-utils.inline.h:126] > d04ba69d5da8ffd1:a9045b820001] Check failed: bit_width <= sizeof(T) * 8 > (40 vs. 32) > *** Check failure stack trace: *** > @ 0x52f63ac google::LogMessage::Fail() > @ 0x52f7c51 google::LogMessage::SendToLog() > @ 0x52f5d86 google::LogMessage::Flush() > @ 0x52f934d google::LogMessageFatal::~LogMessageFatal() > @ 0x2b265b5 impala::BatchedBitReader::UnpackBatch<>() > @ 0x2ae8623 impala::RleBatchDecoder<>::FillLiteralBuffer() > @ 0x2b2cadb impala::RleBatchDecoder<>::DecodeLiteralValues<>() > @ 0x2b27bfb impala::DictDecoder<>::DecodeNextValue() > @ 0x2b16fed > impala::ScalarColumnReader<>::ReadSlotsNoConversion() > @ 0x2ac7252 impala::ScalarColumnReader<>::ReadSlots() > @ 0x2a76cef > impala::ScalarColumnReader<>::MaterializeValueBatchRepeatedDefLevel() > @ 0x2a58faa impala::ScalarColumnReader<>::ReadValueBatch<>() > @ 0x2a20e8e > impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch() > @ 0x29b189c impala::HdfsParquetScanner::AssembleRows() > @ 0x29ac6de impala::HdfsParquetScanner::GetNextInternal() > @ 0x29aa656 impala::HdfsParquetScanner::ProcessSplit() > @ 0x249172d impala::HdfsScanNode::ProcessSplit() > @ 0x2490902 impala::HdfsScanNode::ScannerThread() > @ 0x248fc8b > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x2492253 > {noformat} > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/6915 > Log lines around the failure: > {noformat} > [gw5] PASSED > query_test/test_scanners.py::TestParquet::test_bad_compression_codec[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] > query_test/test_nested_types.py::TestMaxNestingDepth::test_load_hive_table[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > query_test/test_scanners.py::TestParquet::test_bad_compression_codec[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': > '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > [gw1] PASSED > query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q7[protocol: > beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q8[protocol: > beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > [gw1] PASSED > query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q8[protocol: > beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > query_test/test_tpcds_queries.py::TestTpcdsQuery::test_tpcds_q10a[protocol: > beeswax | exec_option: {'decimal_v2': 0, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > [gw10] PASSED > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_decimal_tbl[protocol: > bee
[jira] [Created] (IMPALA-8843) Restrict bit unpacking to unsigned integer types
Daniel Becker created IMPALA-8843: - Summary: Restrict bit unpacking to unsigned integer types Key: IMPALA-8843 URL: https://issues.apache.org/jira/browse/IMPALA-8843 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Daniel Becker Restrict bit unpacking to the unsigned integer types uint8_t, uint16_t, uint32_t and uint64_t. It is straightforward how to unpack to these types and less so with signed types. Instead of bool, we can use uint8_t and possibly cast it to bool. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8840) Check failed: num_bytes <= sizeof(T) (5 vs. 4)
[ https://issues.apache.org/jira/browse/IMPALA-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8840. --- Resolution: Fixed > Check failed: num_bytes <= sizeof(T) (5 vs. 4) > --- > > Key: IMPALA-8840 > URL: https://issues.apache.org/jira/browse/IMPALA-8840 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Xiaomeng Zhang >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build, crash > > Not sure if this is due to same issue as > https://issues.apache.org/jira/browse/IMPALA-8833#, the error message is a > little different. > {code:java} > F0805 18:48:08.737411 5488 bit-stream-utils.inline.h:173] > 284731e5d1aad693:05c883020001] Check failed: num_bytes <= sizeof(T) (8 > vs. 4) > *** Check failure stack trace: *** > @ 0x52fb9bc google::LogMessage::Fail() > @ 0x52fd261 google::LogMessage::SendToLog() > @ 0x52fb396 google::LogMessage::Flush() > @ 0x52fe95d google::LogMessageFatal::~LogMessageFatal() > @ 0x2b2b867 impala::BatchedBitReader::GetBytes<>() > @ 0x2aeda65 impala::RleBatchDecoder<>::NextCounts() > @ 0x2a82896 impala::RleBatchDecoder<>::NextNumRepeats() > @ 0x2b1927f impala::ScalarColumnReader<>::ReadSlotsNoConversion() > @ 0x2ac7c2c impala::ScalarColumnReader<>::ReadSlots() > @ 0x2a7b861 > impala::ScalarColumnReader<>::MaterializeValueBatchRepeatedDefLevel() > @ 0x2a5b3b0 impala::ScalarColumnReader<>::ReadValueBatch<>() > @ 0x2a256a4 impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch() > @ 0x29b6eb6 impala::HdfsParquetScanner::AssembleRows() > @ 0x29b1cf8 impala::HdfsParquetScanner::GetNextInternal() > @ 0x29afc70 impala::HdfsParquetScanner::ProcessSplit() > @ 0x2494bc3 impala::HdfsScanNode::ProcessSplit() > @ 0x2493d98 impala::HdfsScanNode::ScannerThread() > @ 0x2493121 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x24956e9 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1ea0241 boost::function0<>::operator()() > @ 0x23de77a impala::Thread::SuperviseThread() > @ 0x23e6afe boost::_bi::list5<>::operator()<>() > @ 0x23e6a22 boost::_bi::bind_t<>::operator()() > @ 0x23e69e5 boost::detail::thread_data<>::run() > @ 0x4224819 thread_proxy > @ 0x7fc1818c5e24 start_thread > @ 0x7fc17e01f34c __clone > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8846) Undefined behaviour in RleEncoder::Put
Daniel Becker created IMPALA-8846: - Summary: Undefined behaviour in RleEncoder::Put Key: IMPALA-8846 URL: https://issues.apache.org/jira/browse/IMPALA-8846 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker On line [https://github.com/apache/impala/blob/4000da35be69e469500f5f11e0e5fdec119cf5c7/be/src/util/rle-encoding.h#L346,] we test repeat_count_ <= std::numeric_limits::max(), which is always true (repeat_count_ is an int), then we increment repeat_count which could be std::numeric_limits::max() and overflow, which is undefined behaviour for signed integers. We should either change <= to < or if we think that this never happens, remove the misleading check. If we correct the check, it may lead to some (probably small) performance regression because the compiler could have optimised this out. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8846) Undefined behaviour in RleEncoder::Put
[ https://issues.apache.org/jira/browse/IMPALA-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8846. --- Resolution: Fixed > Undefined behaviour in RleEncoder::Put > -- > > Key: IMPALA-8846 > URL: https://issues.apache.org/jira/browse/IMPALA-8846 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Attachments: original.txt, with_check.txt > > > On line > [https://github.com/apache/impala/blob/4000da35be69e469500f5f11e0e5fdec119cf5c7/be/src/util/rle-encoding.h#L346,] > we test repeat_count_ <= std::numeric_limits::max(), which is > always true (repeat_count_ is an int), then we increment repeat_count which > could be std::numeric_limits::max() and overflow, which is undefined > behaviour for signed integers. > > We should either change <= to < or if we think that this never happens, > remove the misleading check. > If we correct the check, it may lead to some (probably small) performance > regression because the compiler could have optimised this out. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8796) Add unit tests to UnpackAndDecodeValues
[ https://issues.apache.org/jira/browse/IMPALA-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8796. --- Resolution: Fixed > Add unit tests to UnpackAndDecodeValues > --- > > Key: IMPALA-8796 > URL: https://issues.apache.org/jira/browse/IMPALA-8796 > Project: IMPALA > Issue Type: Test > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Minor > > BitPacking::UnpackAndDecodeValues has no unit tests in bit-packing-test.cc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8710) Increase allowed bit width to 64 for bit packing
[ https://issues.apache.org/jira/browse/IMPALA-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8710. --- Resolution: Fixed > Increase allowed bit width to 64 for bit packing > > > Key: IMPALA-8710 > URL: https://issues.apache.org/jira/browse/IMPALA-8710 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Increase the allowed bit width for bit packing and bit unpacking to 64 > bits. This is needed to support Parquet Delta Encoding. > > Also add new methods to BitWriter and BatchedBitReader handling Uleb and > ZigZag integers for 64 bits. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8843) Restrict bit unpacking to unsigned integer types
[ https://issues.apache.org/jira/browse/IMPALA-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8843. --- Resolution: Fixed > Restrict bit unpacking to unsigned integer types > > > Key: IMPALA-8843 > URL: https://issues.apache.org/jira/browse/IMPALA-8843 > Project: IMPALA > Issue Type: Improvement >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Minor > > Restrict bit unpacking to the unsigned integer types uint8_t, uint16_t, > uint32_t and uint64_t. It is straightforward how to unpack to these types and > less so with signed types. Instead of bool, we can use uint8_t and possibly > cast it to bool. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails
Daniel Becker created IMPALA-9111: - Summary: Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails Key: IMPALA-9111 URL: https://issues.apache.org/jira/browse/IMPALA-9111 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Starting the Impala cluster with ``` bin/start-impala-cluster.py --impalad_args="-disable_optimization_passes" ``` the following query fails and Impala crashes: ``` SELECT d28_1 FROM functional.decimal_rtf_tbl ORDER BY d28_1; ``` This error happens if the inlining pass in OptimizeModule in be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens with decimals that need to be stored on 16 bytes. Maybe it is some ABI incompatibility with Decimal16Value. Stack trace: #0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x7fda6e64002a in __GI_abort () at abort.c:89 #2 0x7fda71707149 in os::abort(bool) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #3 0x7fda718bad27 in VMError::report_and_die() () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #4 0x7fda71710e4f in JVM_handle_linux_signal () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #6 #7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, impala::ColumnType const&) () #8 0x7fd9c3438e25 in Compare () #9 0x02a26293 in impala::TupleRowComparator::Compare (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at be/src/util/tuple-row-compare.h:98 #10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at be/src/util/tuple-row-compare.h:107 #11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72 #12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree (this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at be/src/runtime/sorter-ir.cc:214 #13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206 #14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165 #15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, run=0x13974da0) at be/src/runtime/sorter.cc:755 #16 0x02a18e27 in impala::Sorter::SortCurrentInputRun (this=0x1284e3c0) at be/src/runtime/sorter.cc:956 #17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at be/src/runtime/sorter.cc:892 #18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, state=0x11e652a0) at be/src/exec/sort-node.cc:187 #19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, state=0x11e652a0) at be/src/exec/sort-node.cc:90 #20 0x020f289a in impala::FragmentInstanceState::Open (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:348 #21 0x020ef54c in impala::FragmentInstanceState::Exec (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:84 #22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, fis=0xe0571e0) at be/src/runtime/query-state.cc:650 #23 0x02101268 in impala::QueryStateoperator()(void) const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558 #24 0x02104c7d in boost::detail::function::void_function_obj_invoker0, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153 #25 0x01f04b46 in boost::function0::operator() (this=0x7fd9c3c4bca0) at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767 #26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) (Python Exception No type named class std::basic_string, std::allocator >::_Rep.: name=, Python Exception No type named class std::basic_string, std::allocator >::_Rep.: category=, functor=..., parent_thread_info=0x7fd9c4c4d950, thread_started=0x7fd9c4c4c8f0) at be/src/util/thread.cc:360 #27 0x02483e81 in boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0xd3857c0, f=@0xd3857b8: 0x247b796 , impala::ThreadDebugInfo con
[jira] [Created] (IMPALA-9394) ASAN crash in exhaustive test
Daniel Becker created IMPALA-9394: - Summary: ASAN crash in exhaustive test Key: IMPALA-9394 URL: https://issues.apache.org/jira/browse/IMPALA-9394 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Attachments: impalad.ERROR In ASAN builds running the below test crashes Impala: {code:java} ./bin/impala-py.test tests/query_test/test_tablesample.py::TestTableSample::test_tablesample --exploration_strategy=exhaustive{code} The crash happens at {code:java} table_format: text/lzo/block{code} See the attachment for the error log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9394) ASAN crash in exhaustive test
[ https://issues.apache.org/jira/browse/IMPALA-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-9394. --- Resolution: Fixed > ASAN crash in exhaustive test > - > > Key: IMPALA-9394 > URL: https://issues.apache.org/jira/browse/IMPALA-9394 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Priority: Major > Attachments: impalad.ERROR > > > In ASAN builds running the below test crashes Impala: > {code:java} > ./bin/impala-py.test > tests/query_test/test_tablesample.py::TestTableSample::test_tablesample > --exploration_strategy=exhaustive{code} > The crash happens at > {code:java} > table_format: text/lzo/block{code} > See the attachment for the error log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IMPALA-8741) Speed up bit unpacking by vectorisation
[ https://issues.apache.org/jira/browse/IMPALA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker closed IMPALA-8741. - Resolution: Abandoned > Speed up bit unpacking by vectorisation > --- > > Key: IMPALA-8741 > URL: https://issues.apache.org/jira/browse/IMPALA-8741 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Attachments: demo3.png > > > Using compiler intrinsics for SIMD and bit manipulation instructions (AVX, > AVX2 and BMI2), we can speed up bit unpacking by a factor of about 2 to 8 > depending on bit width, at most 16. > We need to take care to check that the required instructions are supported by > the CPU the impalad is running on and fall back to the scalar implementation > if not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9747) More fine-grained codegen for text file scanners
[ https://issues.apache.org/jira/browse/IMPALA-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-9747. --- Resolution: Implemented > More fine-grained codegen for text file scanners > > > Key: IMPALA-9747 > URL: https://issues.apache.org/jira/browse/IMPALA-9747 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Daniel Becker >Priority: Major > > Currently if the materialization of any column cannot be codegend for some > reason (e.g. it is CHAR(N)), then the whole codegen is cancelled for the text > scanner, see: > https://github.com/apache/impala/blob/b5805de3e65fd1c7154e4169b323bb38ddc54f4f/be/src/exec/text-converter.cc#L112 > https://github.com/apache/impala/blob/58273fff601dcc763ac43f7cc275a174a2e18b6b/be/src/exec/hdfs-scanner.cc#L342 > It would be much better to use the non-codegend path only for the problematic > columns and use the codegend materialization for the rest + always do > conjunct evaluation with codegen. > The codegend path orders slots based on the conjuncts that use them and > evaluates conjuncts when the slots it need becomes available, so if the row > is dropped then the rest of the slots do not need to be materialized. A > simple solution would be to always do non-codegend slot materialization first > so that they are ready if a conjunct needs them. Moving the columns that are > not used by conjuncts to the end could be a further optimization. > This came up during the materialization of BINARY columns, which needs > base64 decoding during materialization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7923) DecimalValue should be marked as packed
[ https://issues.apache.org/jira/browse/IMPALA-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-7923. --- Resolution: Fixed > DecimalValue should be marked as packed > --- > > Key: IMPALA-7923 > URL: https://issues.apache.org/jira/browse/IMPALA-7923 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Major > > IMPALA-7473 was a symptom of a more general problem that DecimalValue is not > guaranteed to be aligned by the Impala runtime, but the class is not marked > as packed and, under some circumstances, GCC will emit code for aligned loads > to value_ when value_ is an int128. > Testing helps confirm that the compiler does not emit the problematic loads > in practice, but it would be better to mark the struct as packed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails
[ https://issues.apache.org/jira/browse/IMPALA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-9111. --- Resolution: Fixed > Sorting 'Decimal16Value's with codegen enabled but codegen optimizations > disabled fails > --- > > Key: IMPALA-9111 > URL: https://issues.apache.org/jira/browse/IMPALA-9111 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Minor > Labels: crash > > Starting the Impala cluster with > {code:java} > bin/start-impala-cluster.py > --impalad_args="-disable_optimization_passes"{code} > > the following query fails and Impala crashes: > > {code:java} > SELECT d28_1 > FROM functional.decimal_rtf_tbl ORDER BY d28_1;{code} > > This error happens if the inlining pass in OptimizeModule in > be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens > with decimals that need to be stored on 16 bytes. Maybe it is some ABI > incompatibility with Decimal16Value. > Stack trace: > {code:java} > #0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x7fda6e64002a in __GI_abort () at abort.c:89 > #2 0x7fda71707149 in os::abort(bool) () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #3 0x7fda718bad27 in VMError::report_and_die() () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #4 0x7fda71710e4f in JVM_handle_linux_signal () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #6 > #7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, > impala::ColumnType const&) () > #8 0x7fd9c3438e25 in Compare () > #9 0x02a26293 in impala::TupleRowComparator::Compare > (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at > be/src/util/tuple-row-compare.h:98 > #10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, > this=0x1284e480) at be/src/util/tuple-row-compare.h:107 > #11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, > rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72 > #12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree > (this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at > be/src/runtime/sorter-ir.cc:214 > #13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot > (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206 > #14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper > (this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165 > #15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, > run=0x13974da0) at be/src/runtime/sorter.cc:755 > #16 0x02a18e27 in impala::Sorter::SortCurrentInputRun > (this=0x1284e3c0) at be/src/runtime/sorter.cc:956 > #17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at > be/src/runtime/sorter.cc:892 > #18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, > state=0x11e652a0) at be/src/exec/sort-node.cc:187 > #19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, > state=0x11e652a0) at be/src/exec/sort-node.cc:90 > #20 0x020f289a in impala::FragmentInstanceState::Open > (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:348 > #21 0x020ef54c in impala::FragmentInstanceState::Exec > (this=0xe0571e0) at be/src/runtime/fragment-instance-state.cc:84 > #22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, > fis=0xe0571e0) at be/src/runtime/query-state.cc:650 > #23 0x02101268 in impala::QueryStateoperator()(void) > const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558 > #24 0x02104c7d in > boost::detail::function::void_function_obj_invoker0, > void>::invoke(boost::detail::function::function_buffer &) > (function_obj_ptr=...) > at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153 > #25 0x01f04b46 in boost::function0::operator() > (this=0x7fd9c3c4bca0) at > toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767 > #26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) (Python Exception 'gdb.error'> No type named class std::basic_string std::char_traits, std::allocator >::_Rep.: > name=, Python Exception No type named class > std::basic_string, std::allocator > >::_Rep.: > category=, functor=...
[jira] [Resolved] (IMPALA-5444) Asynchronous code generation
[ https://issues.apache.org/jira/browse/IMPALA-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-5444. --- Resolution: Implemented > Asynchronous code generation > > > Key: IMPALA-5444 > URL: https://issues.apache.org/jira/browse/IMPALA-5444 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Daniel Becker >Priority: Minor > Labels: codegen > > Currently, codegen happens during the preparation phase of a query fragment. > In other words, the query fragment cannot start running until the code > generation is complete. There are queries in which the code generation time > is taking a huge amount of time. While we should disable codegen in some exec > nodes if we can accurately estimate in the planner that running without > codegen will be better off (e.g. number of rows to process is relatively > small), we will still pay the price if say the stats is stale or the > estimation is off. > With async codegen, the idea is that we should run the code generation in a > separate thread so that codegen is not on the critical path of the query > execution. Once codegen completes for a fragment, we can atomically swap the > function pointers of compiled functions embedded in the exec nodes. The exec > nodes all currently support falling back to interpretation if the codegend > functions don't exist anyway (i.e. the pointer to the compiled function is > NULL). In some cases, it can occur that the query may run to completion > before codegen completes. Once IMPALA-3259 is fixed (if feasible), we should > be able to cancel the codegen execution. > Another thing to note is that we should be able to bound the codegen work to > a set of threads in thread pool so as to control the CPU and memory resources > consumed by codegen. > Another potential extension of this decoupling is IMPALA-9660. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal
[ https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-7655. --- Resolution: Fixed > Codegen output for conditional functions (if,isnull, coalesce) is very > suboptimal > - > > Key: IMPALA-7655 > URL: https://issues.apache.org/jira/browse/IMPALA-7655 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Major > Labels: codegen, perf, performance > > https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation > involving an if() function was very slow, 10x slower than the equivalent > version using a case: > {noformat} > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case > when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(case when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:17:31 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642 > +--+ > | count(case when l_orderkey is null then 1 else null end) | > +--+ > | 0| > +--+ > Fetched 1 row(s) in 0.51s > +--++--+--+++--+---+-+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--++--+--+++--+---+-+ > | 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE| > | 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--++--+--+++--+---+-+ > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select > count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(if(l_orderkey is NULL, 1, NULL)) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:23:07 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca26 > ++ > | count(if(l_orderkey is null, 1, null)) | > ++ > | 0 | > ++ > Fetched 1 row(s) in 1.01s > +--++--+--+++--+---+-+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--++--+--+++--+---+-+ > | 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE| > | 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--++--+--+++--+---+-+ > {noformat} > It turns out that this is because we don't have good codegen support for > ConditionalFunction, and just fall back to emitting a call to the interpreted > path: > https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28 > See CaseExpr for an example of much better codegen support: > https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9984) Implement codegen for TupleIsNullPredicate
Daniel Becker created IMPALA-9984: - Summary: Implement codegen for TupleIsNullPredicate Key: IMPALA-9984 URL: https://issues.apache.org/jira/browse/IMPALA-9984 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Daniel Becker IMPALA-7657 left codegen for TupleIsNullPredicate unimplemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9984) Implement codegen for TupleIsNullPredicate
[ https://issues.apache.org/jira/browse/IMPALA-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-9984. --- Resolution: Implemented > Implement codegen for TupleIsNullPredicate > -- > > Key: IMPALA-9984 > URL: https://issues.apache.org/jira/browse/IMPALA-9984 > Project: IMPALA > Issue Type: Improvement >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > IMPALA-7657 left codegen for TupleIsNullPredicate unimplemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10078) Proper codegen for KuduPartitionExpr
Daniel Becker created IMPALA-10078: -- Summary: Proper codegen for KuduPartitionExpr Key: IMPALA-10078 URL: https://issues.apache.org/jira/browse/IMPALA-10078 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Daniel Becker Implement codegen for KuduPartitionExpr and remove the use of GetCodegendComputeFnWrapper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7658) Proper codegen for HiveUdfCall
[ https://issues.apache.org/jira/browse/IMPALA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-7658. --- Resolution: Implemented > Proper codegen for HiveUdfCall > -- > > Key: IMPALA-7658 > URL: https://issues.apache.org/jira/browse/IMPALA-7658 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Major > Labels: codegen, performance > > This function uses GetCodegendComputeFnWrapper() to call the interpreted path > but instead we could codegen the Evaluate() function to reduce the overhead. > I think this is likely to be a little involved since there's a loop to > unroll, so the solution might end up looking like IMPALA-5168 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7656) Remove all uses of GetCodegendComputeFnWrapper()
[ https://issues.apache.org/jira/browse/IMPALA-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-7656. --- Resolution: Fixed > Remove all uses of GetCodegendComputeFnWrapper() > > > Key: IMPALA-7656 > URL: https://issues.apache.org/jira/browse/IMPALA-7656 > Project: IMPALA > Issue Type: Epic > Components: Backend >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Major > Labels: codegen > > We should really get rid of all uses of this function, it was a stopgap to > add codegen support to expressions without really doing the work, but its > output can be 10x slower than doing it properly, e.g. see IMPALA-7655 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10078) Proper codegen for KuduPartitionExpr
[ https://issues.apache.org/jira/browse/IMPALA-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10078. Resolution: Implemented > Proper codegen for KuduPartitionExpr > > > Key: IMPALA-10078 > URL: https://issues.apache.org/jira/browse/IMPALA-10078 > Project: IMPALA > Issue Type: Improvement >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Implement codegen for KuduPartitionExpr and remove the use of > GetCodegendComputeFnWrapper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10196) Remove LlvmCodeGen::CastPtrToLlvmPtr
Daniel Becker created IMPALA-10196: -- Summary: Remove LlvmCodeGen::CastPtrToLlvmPtr Key: IMPALA-10196 URL: https://issues.apache.org/jira/browse/IMPALA-10196 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker LlvmCodeGen::CastPtrToLlvmPtr embeds a pointer that points to data in the current process's memory into codegen'd IR code. Our long term goal is to share the codegen'd IR among processes working on the same fragment, which is not possible if the IR contains pointers pointing to data of a specific process. A step in making the IR independent of the process generating it is removing LlvmCodeGen::CastPtrToLlvmPtr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10196) Remove LlvmCodeGen::CastPtrToLlvmPtr
[ https://issues.apache.org/jira/browse/IMPALA-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10196. Resolution: Fixed > Remove LlvmCodeGen::CastPtrToLlvmPtr > > > Key: IMPALA-10196 > URL: https://issues.apache.org/jira/browse/IMPALA-10196 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > LlvmCodeGen::CastPtrToLlvmPtr embeds a pointer that points to data in the > current process's memory into codegen'd IR code. Our long term goal is to > share the codegen'd IR among processes working on the same fragment, which is > not possible if the IR contains pointers pointing to data of a specific > process. A step in making the IR independent of the process generating it is > removing LlvmCodeGen::CastPtrToLlvmPtr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10332) Add file formats to HdfsScanNode's thrift representation and codegen for those
Daniel Becker created IMPALA-10332: -- Summary: Add file formats to HdfsScanNode's thrift representation and codegen for those Key: IMPALA-10332 URL: https://issues.apache.org/jira/browse/IMPALA-10332 Project: IMPALA Issue Type: Improvement Components: Backend, Frontend Reporter: Daniel Becker Assignee: Daniel Becker List all file formats that a HdfsScanNode needs to process in any fragment instance. It is possible that some file formats will not be needed in all fragment instances. This is a step towards sharing codegen between different impala backends. Using the file formats provided in the thrift file, a backend can codegen code for file formats that are not needed in its own process but are needed in other fragment instances running on other backends, and the resulting binary can be shared between multiple backends. Codegenning for file formats will be done based on the thrift message and not on what is needed for the actual backend. This leads to some extra work in case a file format is not needed for the current backend and codegen sharing is not available (at this point it is not implemented). However, the overall number of such cases is low. Also adding the file formats to the node's explain string. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10332) Add file formats to HdfsScanNode's thrift representation and codegen for those
[ https://issues.apache.org/jira/browse/IMPALA-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10332. Resolution: Implemented > Add file formats to HdfsScanNode's thrift representation and codegen for those > -- > > Key: IMPALA-10332 > URL: https://issues.apache.org/jira/browse/IMPALA-10332 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > List all file formats that a HdfsScanNode needs to process in any fragment > instance. It is possible that some file formats will not be needed in all > fragment instances. > This is a step towards sharing codegen between different impala backends. > Using the file formats provided in the thrift file, a backend can codegen > code for file formats that are not needed in its own process but are needed > in other fragment instances running on other backends, and the resulting > binary can be shared between multiple backends. > Codegenning for file formats will be done based on the thrift message and not > on what is needed for the actual backend. This leads to some extra work in > case a file format is not needed for the current backend and codegen sharing > is not available (at this point it is not implemented). However, the overall > number of such cases is low. > Also adding the file formats to the node's explain string. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10371) test_java_udfs crash impalad if result spooling is enabled
[ https://issues.apache.org/jira/browse/IMPALA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10371. Resolution: Fixed > test_java_udfs crash impalad if result spooling is enabled > -- > > Key: IMPALA-10371 > URL: https://issues.apache.org/jira/browse/IMPALA-10371 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Riza Suminto >Assignee: Daniel Becker >Priority: Blocker > Attachments: 46a19881-resolved.txt, hs_err_pid12878.log > > > The following test query from TestUdfExecution::test_java_udfs crash impalad > when result spooling is enabled. > {code:java} > select throws_exception() from functional.alltypestiny{code} > The following is a truncated JVM crash log related to the crash > {code:java} > --- T H R E A D ---Current thread > (0x0fb4c000): JavaThread "Thread-700" [_thread_in_native, id=30853, > stack(0x7f79715ff000,0x7f7971dff000)]Stack: > [0x7f79715ff000,0x7f7971dff000], sp=0x7f7971dfa280, free > space=8172k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V [libjvm.so+0xb6b032] > V [libjvm.so+0x4f14bd] > V [libjvm.so+0x80fa8f] > V [libjvm.so+0x7e0991] > V [libjvm.so+0x69fa10] > j > org.apache.impala.TestUdfException.evaluate()Lorg/apache/hadoop/io/BooleanWritable;+9 > v ~StubRoutines::call_stub > V [libjvm.so+0x6af9ba] > V [libjvm.so+0xa1def8] > V [libjvm.so+0xa1f8d5] > V [libjvm.so+0x7610f8] JVM_InvokeMethod+0x128 > J 2286 > sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (0 bytes) @ 0x7f7acb553ced [0x7f7acb553c00+0xed] > J 6921 C2 > sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (104 bytes) @ 0x7f7acbd1de38 [0x7f7acbd1ddc0+0x78] > J 3645 C2 org.apache.impala.hive.executor.UdfExecutor.evaluate()V (396 bytes) > @ 0x7f7acaf6e894 [0x7f7acaf6e640+0x254] > v ~StubRoutines::call_stub > V [libjvm.so+0x6af9ba] > V [libjvm.so+0x72c046] > V [libjvm.so+0x730523] > C 0x7f7ab4c5d0d0 > C [impalad+0x26a2648] > impala::ScalarExprEvaluator::GetValue(impala::ScalarExpr const&, > impala::TupleRow const*)+0x7a > C [impalad+0x26a25cb] > impala::ScalarExprEvaluator::GetValue(impala::TupleRow const*)+0x2b > C [impalad+0x21f4f78] > impala::AsciiQueryResultSet::AddRows(std::vector std::allocator > const&, impala::RowBatch*, > int, int)+0x4c2 > C [impalad+0x25c5862] > impala::BufferedPlanRootSink::GetNext(impala::RuntimeState*, > impala::QueryResultSet*, int, bool*, long)+0x70c > C [impalad+0x296cf17] impala::Coordinator::GetNext(impala::QueryResultSet*, > int, bool*, long)+0x557 > C [impalad+0x219f5fe] impala::ClientRequestState::FetchRowsInternal(int, > impala::QueryResultSet*, long)+0x6b2 > C [impalad+0x219d98e] impala::ClientRequestState::FetchRows(int, > impala::QueryResultSet*, long)+0x46 > C [impalad+0x21c1d29] > impala::ImpalaServer::FetchInternal(impala::TUniqueId, bool, int, > beeswax::Results*)+0x717 > C [impalad+0x21bbde9] impala::ImpalaServer::fetch(beeswax::Results&, > beeswax::QueryHandle const&, bool, int)+0x577 > {code} > If result spooling is enabled, BufferedPlanRootSink will be used and > ScalarExprEvaluation will be called in BufferedPlanRootSink::GetNext, leading > to this crash. > Without result spooling, BlockingPlanRootSink will be used and > ScalarExprEvaluation is called in BlockingPlanRootSink::Send. No crash happen > when result spooling is disabled. > Attached is the full JVM crash log and resolved minidump. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10640) Support reading Parquet Bloom filters - most common types
Daniel Becker created IMPALA-10640: -- Summary: Support reading Parquet Bloom filters - most common types Key: IMPALA-10640 URL: https://issues.apache.org/jira/browse/IMPALA-10640 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Support reading Parquet Bloom filters for the most common types: integers, float, double and Impala strings. Supporting these types is relatively easy in comparison to most other types. Support for other types may be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10641) Support reading Parquet Bloom filters - missing types
Daniel Becker created IMPALA-10641: -- Summary: Support reading Parquet Bloom filters - missing types Key: IMPALA-10641 URL: https://issues.apache.org/jira/browse/IMPALA-10641 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Daniel Becker This Jira tracks the addition of read support for Parquet Bloom filters for the types not dealt with in IMPALA-10640. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10642) Write support for Parquet Bloom filters - most common types
Daniel Becker created IMPALA-10642: -- Summary: Write support for Parquet Bloom filters - most common types Key: IMPALA-10642 URL: https://issues.apache.org/jira/browse/IMPALA-10642 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Support writing Parquet Bloom filters for the most common types: integers, float, double and Impala strings. Support for other types may be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10640) Support reading Parquet Bloom filters - most common types
[ https://issues.apache.org/jira/browse/IMPALA-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10640. Resolution: Implemented > Support reading Parquet Bloom filters - most common types > - > > Key: IMPALA-10640 > URL: https://issues.apache.org/jira/browse/IMPALA-10640 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: parquet > > Support reading Parquet Bloom filters for the most common types: integers, > float, double and Impala strings. Supporting these types is relatively easy > in comparison to most other types. Support for other types may be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10642) Write support for Parquet Bloom filters - most common types
[ https://issues.apache.org/jira/browse/IMPALA-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10642. Resolution: Implemented > Write support for Parquet Bloom filters - most common types > --- > > Key: IMPALA-10642 > URL: https://issues.apache.org/jira/browse/IMPALA-10642 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Support writing Parquet Bloom filters for the most common types: integers, > float, double and Impala strings. Support for other types may be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10826) Test failure in TestEventProcessing.test_transactional_insert_events
Daniel Becker created IMPALA-10826: -- Summary: Test failure in TestEventProcessing.test_transactional_insert_events Key: IMPALA-10826 URL: https://issues.apache.org/jira/browse/IMPALA-10826 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.1 Reporter: Daniel Becker Assignee: Zoltán Borók-Nagy The test {code:java} custom_cluster.test_event_processing.TestEventProcessing.test_transactional_insert_events{code} failed after 3045f585dd64b8d92ba2f126264a0c0e20d4a4dd was merged. Stack trace: {code:java} custom_cluster/test_event_processing.py:99: in test_transactional_insert_events self.run_test_insert_events(unique_database, is_transactional=True) custom_cluster/test_event_processing.py:139: in run_test_insert_events assert data.split('\t') == ['101', '200'] E AttributeError: 'NoneType' object has no attribute 'split'{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10851) Codegen for structs
Daniel Becker created IMPALA-10851: -- Summary: Codegen for structs Key: IMPALA-10851 URL: https://issues.apache.org/jira/browse/IMPALA-10851 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Daniel Becker IMPALA-9495 adds support for struct types in SELECT lists but only with codegen turned off. We should remove this restriction either by implementing codegen for struct types or calling interpreted code from codegen code to handle structs. This latter option is still better than turning off codegen completely because other parts of the query that do not handle structs could benefit from codegen. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10929) Optimise memory usage of structs in tuples
Daniel Becker created IMPALA-10929: -- Summary: Optimise memory usage of structs in tuples Key: IMPALA-10929 URL: https://issues.apache.org/jira/browse/IMPALA-10929 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Daniel Becker If we have both a whole struct and one of its members (or a member of a member etc.) in the select list, the whole struct and the member are assigned to different slots in the tuple. We could use less memory if the member expression used the slot within the whole struct instead. Example: For the query {code:java} select id, outer_struct from functional_orc_def.complextypes_nested_structs; {code} the row size is 64B, while for {code:java} select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs; {code} it is 80B, although it should not need more memory. It is not limited to the select list, it should also work with where clauses etc., for example {code:java} select id, outer_struct from functional_orc_def.complextypes_nested_structs where outer_struct.inner_struct2.i > 1; {code} should also have a row size of 64B instead of 68B. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10983) Test metadata.test_event_processing.TestEventProcessing.test_insert_events fails
Daniel Becker created IMPALA-10983: -- Summary: Test metadata.test_event_processing.TestEventProcessing.test_insert_events fails Key: IMPALA-10983 URL: https://issues.apache.org/jira/browse/IMPALA-10983 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Test {code:java} metadata.test_event_processing.TestEventProcessing.test_insert_events {code} fails: h3. Error Message {code:java} metadata/test_event_processing.py:48: in test_insert_events self.run_test_insert_events(unique_database) metadata/test_event_processing.py:128: in run_test_insert_events EventProcessorUtils.wait_for_event_processing(self) util/event_processor_utils.py:61: in wait_for_event_processing within {1} seconds".format(current_event_id, timeout)) E Exception: Event processor did not sync till last known event id 31772 within 10 seconds{code} h3. Stacktrace {code:java} metadata/test_event_processing.py:48: in test_insert_events self.run_test_insert_events(unique_database) metadata/test_event_processing.py:128: in run_test_insert_events EventProcessorUtils.wait_for_event_processing(self) util/event_processor_utils.py:61: in wait_for_event_processing within {1} seconds".format(current_event_id, timeout)) E Exception: Event processor did not sync till last known event id 31772 within 10 seconds{code} h3. Standard Error {code:java} SET client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events; -- connecting to: localhost:21000 -- connecting to localhost:21050 with impyla -- 2021-10-22 15:28:00,639 INFO MainThread: Closing active operation -- connecting to localhost:28000 with impyla -- 2021-10-22 15:28:00,665 INFO MainThread: Closing active operation -- connecting to localhost:11050 with impyla SET client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events; SET sync_ddl=False; -- executing against localhost:21000 DROP DATABASE IF EXISTS `test_insert_events_4293827b` CASCADE; -- 2021-10-22 15:28:03,902 INFO MainThread: Started query e74933cd62ece6eb:63eb6081 SET client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_insert_events; SET sync_ddl=False; -- executing against localhost:21000{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-11011) Impala crashes in OrcStructReader::NumElements()
Daniel Becker created IMPALA-11011: -- Summary: Impala crashes in OrcStructReader::NumElements() Key: IMPALA-11011 URL: https://issues.apache.org/jira/browse/IMPALA-11011 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker Running the query {code:java} select inner_arr.ITEM from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as inner_arr;{code} {{in a non-full-acid version/copy of functional_orc_def.complextypestbl crashes Impala because in OrcStructReader::NumElements() 'vbatch_' is NULL and we dereference it.}} {{Steps to reproduce:}} {{1. Use Hive to create a non-full-acid copy of the table:}} * Enter the Hive cmd line: {code:java} hive beeline -u 'jdbc:hive2://localhost:11050/default'{code} * Copy the table with this command: {code:java} create table complextypestbl_non_acid stored as orc tblproperties ("transactional"="true", "transactional_properties"="insert_only") as select * from complextypestbl;{code} 2. In Impala, run the query on the copied table: {code:java} set disable_codegen=true; select inner_arr.ITEM from functional_orc_def.complextypestbl_non_acid.nested_struct.c.d.ITEM as inner_arr;{code} Call stack from GDB: {code:java} #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x7fd5e49e9921 in __GI_abort () at abort.c:79 #2 0x7fd5e7929589 in os::abort(bool) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #3 0x7fd5e7b04fb3 in VMError::report_and_die() () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #4 0x7fd5e7933ce4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #5 0x7fd5e79263b8 in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so #6 #7 0x02c3bd7f in impala::OrcStructReader::NumElements (this=0xf043290) at be/src/exec/orc-column-readers.h:603 #8 0x02c371b7 in impala::OrcListReader::NumElements (this=0x11009420) at be/src/exec/orc-column-readers.cc:563 #9 0x02c371b7 in impala::OrcListReader::NumElements (this=0x11009340) at be/src/exec/orc-column-readers.cc:563 #10 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf043200) at be/src/exec/orc-column-readers.h:606 #11 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf042ea0) at be/src/exec/orc-column-readers.h:606 #12 0x02c3be5b in impala::OrcStructReader::NumElements (this=0xf042e10) at be/src/exec/orc-column-readers.h:606 #13 0x02c3497f in impala::OrcStructReader::EndOfBatch (this=0xf042e10) at be/src/exec/orc-column-readers.cc:294 #14 0x02bf5389 in impala::HdfsOrcScanner::GetNextInternal (this=0xeca4000, row_batch=0xf1c95a0) at be/src/exec/hdfs-orc-scanner.cc:648 #15 0x02bf46b7 in impala::HdfsOrcScanner::ProcessSplit (this=0xeca4000) at be/src/exec/hdfs-orc-scanner.cc:588 #16 0x02d427ff in impala::HdfsScanNode::ProcessSplit (this=0xff85800, filter_ctxs=..., expr_results_pool=0x7fd41a29b4e0, scan_range=0xf2bde00, scanner_thread_reservation=0x7fd41a29b408) at be/src/exec/hdfs-scan-node.cc:500 #17 0x02d41b80 in impala::HdfsScanNode::ScannerThread (this=0xff85800, first_thread=false, scanner_thread_reservation=16384) at be/src/exec/hdfs-scan-node.cc:418 #18 0x02d40ee8 in impala::HdfsScanNodeoperator()(void) const (__closure=0x7fd41a29bc08) at be/src/exec/hdfs-scan-node.cc:339 #19 0x02d43afb in boost::detail::function::void_function_obj_invoker0, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 #20 0x022de8ca in boost::function0::operator() (this=0x7fd41a29bc00) at /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 #21 0x02aa43a0 in impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) (name=..., category=..., functor=..., parent_thread_info=0x7fd40f8858a0, thread_started=0x7fd40f8846a0) at be/src/util/thread.cc:360 #22 0x02aacd01 in boost::_bi::list5, std::allocator > >, boost::_bi::value, std::allocator > >, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, imp
[jira] [Resolved] (IMPALA-11011) Impala crashes in OrcStructReader::NumElements()
[ https://issues.apache.org/jira/browse/IMPALA-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11011. Resolution: Fixed > Impala crashes in OrcStructReader::NumElements() > > > Key: IMPALA-11011 > URL: https://issues.apache.org/jira/browse/IMPALA-11011 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Running the query > {code:java} > select inner_arr.ITEM > from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as > inner_arr;{code} > {{in a non-full-acid version/copy of functional_orc_def.complextypestbl > crashes Impala because in OrcStructReader::NumElements() 'vbatch_' is NULL > and we dereference it.}} > {{Steps to reproduce:}} > {{1. Use Hive to create a non-full-acid copy of the table:}} > * Enter the Hive cmd line: > {code:java} > hive beeline -u 'jdbc:hive2://localhost:11050/default'{code} > * Copy the table with this command: > {code:java} > create table complextypestbl_non_acid stored as orc tblproperties > ("transactional"="true", "transactional_properties"="insert_only") as select > * from complextypestbl;{code} > 2. In Impala, run the query on the copied table: > {code:java} > set disable_codegen=true; > select inner_arr.ITEM > from functional_orc_def.complextypestbl_non_acid.nested_struct.c.d.ITEM as > inner_arr;{code} > > Call stack from GDB: > {code:java} > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1 0x7fd5e49e9921 in __GI_abort () at abort.c:79 > #2 0x7fd5e7929589 in os::abort(bool) () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #3 0x7fd5e7b04fb3 in VMError::report_and_die() () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #4 0x7fd5e7933ce4 in JVM_handle_linux_signal () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #5 0x7fd5e79263b8 in signalHandler(int, siginfo_t*, void*) () from > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > #6 > #7 0x02c3bd7f in impala::OrcStructReader::NumElements > (this=0xf043290) at be/src/exec/orc-column-readers.h:603 > #8 0x02c371b7 in impala::OrcListReader::NumElements > (this=0x11009420) at be/src/exec/orc-column-readers.cc:563 > #9 0x02c371b7 in impala::OrcListReader::NumElements > (this=0x11009340) at be/src/exec/orc-column-readers.cc:563 > #10 0x02c3be5b in impala::OrcStructReader::NumElements > (this=0xf043200) at be/src/exec/orc-column-readers.h:606 > #11 0x02c3be5b in impala::OrcStructReader::NumElements > (this=0xf042ea0) at be/src/exec/orc-column-readers.h:606 > #12 0x02c3be5b in impala::OrcStructReader::NumElements > (this=0xf042e10) at be/src/exec/orc-column-readers.h:606 > #13 0x02c3497f in impala::OrcStructReader::EndOfBatch > (this=0xf042e10) at be/src/exec/orc-column-readers.cc:294 > #14 0x02bf5389 in impala::HdfsOrcScanner::GetNextInternal > (this=0xeca4000, row_batch=0xf1c95a0) at be/src/exec/hdfs-orc-scanner.cc:648 > #15 0x02bf46b7 in impala::HdfsOrcScanner::ProcessSplit > (this=0xeca4000) at be/src/exec/hdfs-orc-scanner.cc:588 > #16 0x02d427ff in impala::HdfsScanNode::ProcessSplit (this=0xff85800, > filter_ctxs=..., expr_results_pool=0x7fd41a29b4e0, scan_range=0xf2bde00, > scanner_thread_reservation=0x7fd41a29b408) at > be/src/exec/hdfs-scan-node.cc:500 > #17 0x02d41b80 in impala::HdfsScanNode::ScannerThread > (this=0xff85800, first_thread=false, scanner_thread_reservation=16384) at > be/src/exec/hdfs-scan-node.cc:418 > #18 0x02d40ee8 in impala::HdfsScanNodeoperator()(void) > const (__closure=0x7fd41a29bc08) at be/src/exec/hdfs-scan-node.cc:339 > #19 0x02d43afb in > boost::detail::function::void_function_obj_invoker0, > void>::invoke(boost::detail::function::function_buffer &) > (function_obj_ptr=...) > at > /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 > #20 0x022de8ca in boost::function0::operator() > (this=0x7fd41a29bc00) at > /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #21 0x02aa43a0 in > impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) (name=..., category=..., > functor=..., parent_thread_info=0x7fd40f8858a0, > thread_started=0x7fd40f8846a0) at be/src/util/thread.cc:360 > #22 0x02aacd01 in > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > s
[jira] [Created] (IMPALA-11059) Speed up zipping unnest by reading collection elements in batches
Daniel Becker created IMPALA-11059: -- Summary: Speed up zipping unnest by reading collection elements in batches Key: IMPALA-11059 URL: https://issues.apache.org/jira/browse/IMPALA-11059 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Try to speed up zipping unnest by reading collection elements in batches. Now we read from the collections row-wise, that is we read one element from each collection and we store them in the corresponding columns in the current row, then proceed to the next element in each collection for the next row etc. The proposal here is to fill the row batch column-wise, i.e. filling the column corresponding to the first collection, then the second collection etc. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11067) Unify struct subexpressions in rows
Daniel Becker created IMPALA-11067: -- Summary: Unify struct subexpressions in rows Key: IMPALA-11067 URL: https://issues.apache.org/jira/browse/IMPALA-11067 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Daniel Becker If a column is given multiple times in the select list, it is not duplicated under the hood in the row because we recognise that multiple columns in the result reference the same actual column, therefore the row size does not increase: {code:java} explain select id, outer_struct from functional_orc_def.complextypes_nested_structs; Query: explain select id, outer_struct from functional_orc_def.complextypes_nested_structs +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 | | Per-Host Resource Estimates: Memory=20MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | | HDFS partitions=1/1 files=1 size=1.18KB | | row-size=64B cardinality=5 | +---+ {code} With the id column duplicated: {code:java} explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs; Query: explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 | | Per-Host Resource Estimates: Memory=20MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | | HDFS partitions=1/1 files=1 size=1.18KB | | row-size=64B cardinality=5 | +---+ {code} However, if we query a struct and a subfield of the same struct, we do not reuse the existing slot in the row but duplicate the subexpression, increasing the row size: {code:java} explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs; Query: explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2 | | Per-Host Resource Estimates: Memory=20MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | | HDFS partitions=1/1 files=1 size=1.18KB | | row-size=80B cardinality=5 | +---+ {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11180) Ranger permission error
Daniel Becker created IMPALA-11180: -- Summary: Ranger permission error Key: IMPALA-11180 URL: https://issues.apache.org/jira/browse/IMPALA-11180 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Riza Suminto There are ranger permission errors in the test in some internal builds, possibly caused by IMPALA-5256. h3. Stacktrace {code:java} authorization/test_authorization.py:158: in test_ranger_show_stmts_with_select self._test_ranger_show_stmts_helper(unique_name, ['select']) authorization/test_authorization.py:125: in _test_ranger_show_stmts_helper .format(priv, unique_name, priv, getuser())) common/impala_connection.py:208: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:187: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:363: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:357: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:520: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: ImpalaBeeswaxException: E INNER EXCEPTION: E MESSAGE: InternalException: Error granting a privilege in Ranger. Ranger error message: Permission denied.{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11186) Assertion fails in TestShowCreateTable.test_show_create_table
Daniel Becker created IMPALA-11186: -- Summary: Assertion fails in TestShowCreateTable.test_show_create_table Key: IMPALA-11186 URL: https://issues.apache.org/jira/browse/IMPALA-11186 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Encountered in internal builds, the test *metadata.test_show_create_table.TestShowCreateTable.test_show_create_table* fails with the following error: h3. Error Message {code:java} metadata/test_show_create_table.py:64: in test_show_create_table unique_database) metadata/test_show_create_table.py:118: in __run_show_create_table_test_case self.__compare_result(expected_result, create_table_result) metadata/test_show_create_table.py:146: in __compare_result assert expected_tbl_props == actual_tbl_props E assert {'engine.hive...t': 'parquet'} == {'engine.hive71ac7bb', ...} E Omitting 4 identical items, use -v to show E Right contains more items: E {'uuid': '02004aff-d553-437e-8d2c-8b35f71ac7bb'} E Use -v to get the full diff{code} h3. Stacktrace {code:java} metadata/test_show_create_table.py:64: in test_show_create_table unique_database) metadata/test_show_create_table.py:118: in __run_show_create_table_test_case self.__compare_result(expected_result, create_table_result) metadata/test_show_create_table.py:146: in __compare_result assert expected_tbl_props == actual_tbl_props E assert {'engine.hive...t': 'parquet'} == {'engine.hive71ac7bb', ...} E Omitting 4 identical items, use -v to show E Right contains more items: E {'uuid': '02004aff-d553-437e-8d2c-8b35f71ac7bb'} E Use -v to get the full diff{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11187) TestKuduOperations.test_read_modes fails
Daniel Becker created IMPALA-11187: -- Summary: TestKuduOperations.test_read_modes fails Key: IMPALA-11187 URL: https://issues.apache.org/jira/browse/IMPALA-11187 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker In an internal build, *query_test.test_kudu.TestKuduOperations.test_read_modes* fails: h3. Error Message {code:java} query_test/test_kudu.py:551: in test_read_modes self._retry_query(cursor, "select count(*) from %s" % table_name, [(103,)]) query_test/test_kudu.py:535: in _retry_query assert retries < 3, \ E AssertionError: Did not get a correct result for select count(*) from test_read_modes_53f93f33.test_read_latest after 3 retries: [(97,)] E assert 3 < 3{code} h3. Stacktrace {code:java} query_test/test_kudu.py:551: in test_read_modes self._retry_query(cursor, "select count(*) from %s" % table_name, [(103,)]) query_test/test_kudu.py:535: in _retry_query assert retries < 3, \ E AssertionError: Did not get a correct result for select count(*) from test_read_modes_53f93f33.test_read_latest after 3 retries: [(97,)] E assert 3 < 3{code} h3. Standard Error {code:java} -- 2022-03-16 01:02:23,338 INFO MainThread: Using database testkuduoperations_1677_vu0h74 as default SET client_identifier=query_test/test_kudu.py::TestKuduOperations::()::test_read_modes; SET sync_ddl=False; -- executing against localhost:21000 DROP DATABASE IF EXISTS `test_read_modes_53f93f33` CASCADE; -- 2022-03-16 01:02:23,342 INFO MainThread: Started query 2e4741a082f8195b:c8427d1b SET client_identifier=query_test/test_kudu.py::TestKuduOperations::()::test_read_modes; SET sync_ddl=False; -- executing against localhost:21000 CREATE DATABASE `test_read_modes_53f93f33`; -- 2022-03-16 01:02:28,252 INFO MainThread: Started query c547f10a0d52e296:62a357c6 -- 2022-03-16 01:02:28,305 INFO MainThread: Created database "test_read_modes_53f93f33" for test ID "query_test/test_kudu.py::TestKuduOperations::()::test_read_modes"{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11188) Could not find artifact org.apache.ozone:ozone-filesystem-hadoop3
Daniel Becker created IMPALA-11188: -- Summary: Could not find artifact org.apache.ozone:ozone-filesystem-hadoop3 Key: IMPALA-11188 URL: https://issues.apache.org/jira/browse/IMPALA-11188 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker In several internal builds compilation fails with the following error: [ERROR] Failed to execute goal on project impala-frontend: Could not resolve dependencies for project org.apache.impala:impala-frontend:jar:3.4.0-SNAPSHOT: Could not find artifact org.apache.ozone:ozone-filesystem-hadoop3:jar:1.1.0.7.1.8.0-531 in nexus-repo ( [http://nexus-private.hortonworks.com/nexus/content/groups/public] ) -> [Help 1]*08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.*08:55:33* 00:55:33 [ERROR] Re-run Maven using the -X switch to enable full debug logging.*08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] For more information about the errors and possible solutions, please read the following articles:*08:55:33* 00:55:33 [ERROR] [Help 1] [http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException] *08:55:33* 00:55:33 [ERROR] *08:55:33* 00:55:33 [ERROR] After correcting the problems, you can resume the build with the command*08:55:33* 00:55:33 [ERROR] mvn -rf :impala-frontend -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11193) Assertion fails in ClientCacheTest.MemLeak
Daniel Becker created IMPALA-11193: -- Summary: Assertion fails in ClientCacheTest.MemLeak Key: IMPALA-11193 URL: https://issues.apache.org/jira/browse/IMPALA-11193 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Yida Wu The test {*}ClientCacheTest.MemLeak{*}, introduced in IMPALA-11176, fails in several internal builds. h3. Error Message {code:java} Expected: (mem_before) > (0), actual: 0 vs 0{code} h3. Stacktrace {code:java} /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/runtime/client-cache-test.cc:100 Expected: (mem_before) > (0), actual: 0 vs 0{code} Interestingly it is not the main assert that fails but a "precondition", namely EXPECT_GT(mem_before, 0). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-11227) FE OOM in TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props
[ https://issues.apache.org/jira/browse/IMPALA-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11227. Resolution: Fixed > FE OOM in TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props > -- > > Key: IMPALA-11227 > URL: https://issues.apache.org/jira/browse/IMPALA-11227 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.1.0 >Reporter: Quanlong Huang >Assignee: Daniel Becker >Priority: Critical > > The huge values clause of the insert statement in > TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props could > cause FE OOM: > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5524/testReport/query_test.test_parquet_bloom_filter/TestParquetBloomFilter/test_fallback_from_dict_if_no_bloom_tbl_props_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/] > {code:bash} > query_test/test_parquet_bloom_filter.py:176: in > test_fallback_from_dict_if_no_bloom_tbl_props > False) > query_test/test_parquet_bloom_filter.py:228: in _create_table_dict_overflow > self.execute_query(insert_stmt, vector.get_value('exec_option')) > common/impala_test_suite.py:836: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:868: in execute_query > return self.__execute_query(self.client, query, query_options) > common/impala_test_suite.py:961: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:212: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:189: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:365: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:359: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:522: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: OutOfMemoryError: GC overhead limit exceeded > {code} > impalad.INFO > {code:java} > I0404 14:24:30.203562 19115 Frontend.java:1871] > 7d4c91ed04f27bc4:d32f7826] Analyzing query: insert into > test_fallback_from_dict_if_no_bloom_tbl_props_a60c835b.fallback_from_dict > values > (0),(2),(4),(6),(8),(10),(12),(14),(16),(18),(20),(22),(24),(26),(28),(30),(32),(34),(36),(38),(40),(42),(44),(46),(48),(50),(52),(54),(56),(58),(60)... > ... > I0404 14:25:18.025733 19115 jni-util.cc:286] > 7d4c91ed04f27bc4:d32f7826] java.lang.OutOfMemoryError: GC overhead > limit exceeded > at > java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:68) > at java.lang.StringBuilder.(StringBuilder.java:89) > at > org.apache.impala.analysis.SelectListItem.toSql(SelectListItem.java:84) > at org.apache.impala.analysis.SelectStmt.toSql(SelectStmt.java:1235) > at > org.apache.impala.analysis.StatementBase.toSql(StatementBase.java:138) > at > org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:308) > at > org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:269) > at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:262) > at > org.apache.impala.analysis.SetOperationStmt$SetOperand.analyze(SetOperationStmt.java:102) > at > org.apache.impala.analysis.SetOperationStmt.analyzeOperands(SetOperationStmt.java:388) > at > org.apache.impala.analysis.SetOperationStmt.analyze(SetOperationStmt.java:318) > at org.apache.impala.analysis.UnionStmt.analyze(UnionStmt.java:49) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:306) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:506) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2012) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1920) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1744) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code} > I saw this twice in another ubuntu-16.04-dockerised-tests job: > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5523/testReport/junit/query_test.test_par
[jira] [Created] (IMPALA-11242) Impala cluster doesn't start when building with debug_noopt
Daniel Becker created IMPALA-11242: -- Summary: Impala cluster doesn't start when building with debug_noopt Key: IMPALA-11242 URL: https://issues.apache.org/jira/browse/IMPALA-11242 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Daniel Becker Assignee: Daniel Becker After building Impala with buildall.sh using the -debug_noopt option, the Impala cluster cannot be started: {code:java} ./buildall.sh -debug_noopt [...] bin/start-impala-cluster.py Traceback (most recent call last): File "bin/start-impala-cluster.py", line 166, in KUDU_RPC_TIMEOUT = build_flavor_timeout(0, slow_build_timeout=6) File "/home/danielbecker/Impala/tests/common/environ.py", line 416, in build_flavor_timeout cluster_properties = ImpalaTestClusterProperties.get_instance() File "/home/danielbecker/Impala/tests/common/environ.py", line 254, in get_instance ImpalaTestClusterFlagsDetector.detect_using_build_root_or_web_ui(IMPALA_HOME) File "/home/danielbecker/Impala/tests/common/environ.py", line 175, in detect_using_build_root_or_web_ui ImpalaTestClusterFlagsDetector.validate_build_flags(build_type, library_link_type) File "/home/danielbecker/Impala/tests/common/environ.py", line 196, in validate_build_flags raise Exception("Unknown build type {0}".format(build_type)) Exception: Unknown build type debug_noopt {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-11242) Impala cluster doesn't start when building with debug_noopt
[ https://issues.apache.org/jira/browse/IMPALA-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11242. Resolution: Fixed > Impala cluster doesn't start when building with debug_noopt > --- > > Key: IMPALA-11242 > URL: https://issues.apache.org/jira/browse/IMPALA-11242 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > After building Impala with buildall.sh using the -debug_noopt option, the > Impala cluster cannot be started: > > {code:java} > ./buildall.sh -debug_noopt > [...] > bin/start-impala-cluster.py > Traceback (most recent call last): > File "bin/start-impala-cluster.py", line 166, in > KUDU_RPC_TIMEOUT = build_flavor_timeout(0, slow_build_timeout=6) > File "/home/danielbecker/Impala/tests/common/environ.py", line 416, in > build_flavor_timeout > cluster_properties = ImpalaTestClusterProperties.get_instance() > File "/home/danielbecker/Impala/tests/common/environ.py", line 254, in > get_instance > ImpalaTestClusterFlagsDetector.detect_using_build_root_or_web_ui(IMPALA_HOME) > File "/home/danielbecker/Impala/tests/common/environ.py", line 175, in > detect_using_build_root_or_web_ui > ImpalaTestClusterFlagsDetector.validate_build_flags(build_type, > library_link_type) > File "/home/danielbecker/Impala/tests/common/environ.py", line 196, in > validate_build_flags > raise Exception("Unknown build type {0}".format(build_type)) > Exception: Unknown build type debug_noopt > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-10839) NULL values are displayed on a wrong level for nested structs (ORC)
[ https://issues.apache.org/jira/browse/IMPALA-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10839. Resolution: Fixed > NULL values are displayed on a wrong level for nested structs (ORC) > --- > > Key: IMPALA-10839 > URL: https://issues.apache.org/jira/browse/IMPALA-10839 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Gabor Kaszab >Assignee: Daniel Becker >Priority: Major > Labels: ORC, complextype, correctness, nested_types, scanner > > When querying a non-toplevel nested struct then the NULL values are displayed > in an incorrect level. E.g.: > {code:java} > select id, outer_struct.inner_struct3 from > functional_orc_def.complextypes_nested_structs where id >= 4; > {code} > {code:java} > +++ > | id | outer_struct.inner_struct3 | > +++ > | 4 | {"s":{"i":null,"s":null}} | > | 5 | {"s":null} | > +++ > {code} > However, here in the first row the expected would be that 's' is null and not > its members and in the second line the result should be 'NULL'. > For reference see what is returned when querying 'outer_struct' instead of > 'outer_struct.inner_struct3': > {code:java} > ++---+ > | 4 | > {"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}} > | > | 5 | > {"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null} > | > ++---+ > {code} > Note, this issues is with ORC format. > After some digging I found that these incorrect null values are already > present in the ORC scanner where OrcStructReader reads the rows in > ReadValue() and ReadValueBatch() functions. > As a first step it would be nice to verify that the external ORC reader we > use for reading the actual values from the files gives correct results. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (IMPALA-10838) Error when struct returned from WITH() and used in an ORDER BY
[ https://issues.apache.org/jira/browse/IMPALA-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10838. Resolution: Fixed > Error when struct returned from WITH() and used in an ORDER BY > -- > > Key: IMPALA-10838 > URL: https://issues.apache.org/jira/browse/IMPALA-10838 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Gabor Kaszab >Assignee: Daniel Becker >Priority: Major > Labels: complextype, nested_types > > {code:java} > with sub as ( > select id, small_struct > from functional_orc_def.complextypes_structs > where length(small_struct.s) > 5) > select sub.id, sub.small_struct from sub order by sub.small_struct.i desc; > {code} > The above query results an error when trying to run SlotRef.toThrift() > {code:java} > ERROR: IllegalStateException: Illegal reference to non-materialized tuple: > debugname=InlineViewRef sub alias=sub tid=2 > {code} > If I rewrite the query a bit to return the member of the struct from the > inline view (WITH()) and use this in the ORDER by then the query succeeds as > expected: > {code:java} > with sub as ( > select id, small_struct, small_struct.i as si > from functional_orc_def.complextypes_structs where small_struct.i > 19200) > select sub.id, sub.small_struct from sub order by sub.si desc; > {code} > In SortNode.toThrift() I checked what the sort exprs and the resolved tuple > exprs are and I see a difference that could be the cause. > In the problematic case: > {code:java} > - sort exprs in SortNode: > SlotRef{label=small_struct.i, type=INT, id=15} > - resolved exprs in SortNode: > SlotRef{label=id, path=id, type=INT, id=0} > SlotRef{label=small_struct, path=small_struct, type=STRUCT, > id=1} > *SlotRef{label=sub.small_struct.i, path=sub.small_struct.i, type=INT, > id=10}* > {code} > In the successful case: > {code:java} > - sort exprs in SortNode: > SlotRef{label=si, type=INT, id=14} > - resolved exprs in SortNode: > SlotRef{label=id, path=id, type=INT, id=0} > SlotRef{label=small_struct, path=small_struct, type=STRUCT, > id=1} > *SlotRef{label=small_struct.i, path=small_struct.i, type=INT, id=4}* > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (IMPALA-11067) Unify struct subexpressions in rows
[ https://issues.apache.org/jira/browse/IMPALA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11067. Resolution: Fixed > Unify struct subexpressions in rows > --- > > Key: IMPALA-11067 > URL: https://issues.apache.org/jira/browse/IMPALA-11067 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: complextype, nested_types > > If a column is given multiple times in the select list, it is not duplicated > under the hood in the row because we recognise that multiple columns in the > result reference the same actual column, therefore the row size does not > increase: > > {code:java} > explain select id, outer_struct from > functional_orc_def.complextypes_nested_structs; > Query: explain select id, outer_struct from > functional_orc_def.complextypes_nested_structs > +---+ > | Explain String | > +---+ > | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 | > | Per-Host Resource Estimates: Memory=20MB | > | Codegen disabled by planner | > | | > | PLAN-ROOT SINK | > | | | > | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | > | HDFS partitions=1/1 files=1 size=1.18KB | > | row-size=64B cardinality=5 | > +---+ > {code} > With the id column duplicated: > > {code:java} > explain select id, id, outer_struct from > functional_orc_def.complextypes_nested_structs; > Query: explain select id, id, outer_struct from > functional_orc_def.complextypes_nested_structs > +---+ > | Explain String | > +---+ > | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 | > | Per-Host Resource Estimates: Memory=20MB | > | Codegen disabled by planner | > | | > | PLAN-ROOT SINK | > | | | > | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | > | HDFS partitions=1/1 files=1 size=1.18KB | > | row-size=64B cardinality=5 | > +---+ > {code} > However, if we query a struct and a subfield of the same struct, we do not > reuse the existing slot in the row but duplicate the subexpression, > increasing the row size: > > {code:java} > explain select id, outer_struct, outer_struct.inner_struct2 from > functional_orc_def.complextypes_nested_structs; > Query: explain select id, outer_struct, outer_struct.inner_struct2 from > functional_orc_def.complextypes_nested_structs > +---+ > | Explain String | > +---+ > | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2 | > | Per-Host Resource Estimates: Memory=20MB | > | Codegen disabled by planner | > | | > | PLAN-ROOT SINK | > | | | > | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] | > | HDFS partitions=1/1 files=1 size=1.18KB | > | row-size=80B cardinality=5 | > +---+ > {code} > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (IMPALA-9470) Use Parquet bloom filters
[ https://issues.apache.org/jira/browse/IMPALA-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-9470. --- Resolution: Implemented > Use Parquet bloom filters > - > > Key: IMPALA-9470 > URL: https://issues.apache.org/jira/browse/IMPALA-9470 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Daniel Becker >Priority: Major > Labels: parquet > > PARQUET-41 has been closed recently. This means Parquet-MR is capable of > writing and reading bloom filters. > Currently bloom filters are per column chunk entries, i.e. with their help we > can filter out entire row groups. > We already filter row groups in HdfsParquetScanner::NextRowGroup() based on > column chunk statistics and dictionaries. Skipping row groups based on bloom > filters could be also added to this funciton. > Impala could also write bloom filters. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (IMPALA-10929) Optimise memory usage of structs in tuples
[ https://issues.apache.org/jira/browse/IMPALA-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10929. Resolution: Duplicate > Optimise memory usage of structs in tuples > -- > > Key: IMPALA-10929 > URL: https://issues.apache.org/jira/browse/IMPALA-10929 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > If we have both a whole struct and one of its members (or a member of a > member etc.) in the select list, the whole struct and the member are assigned > to different slots in the tuple. We could use less memory if the member > expression used the slot within the whole struct instead. > Example: > For the query > {code:java} > select id, outer_struct from functional_orc_def.complextypes_nested_structs; > {code} > the row size is 64B, while for > {code:java} > select id, outer_struct, outer_struct.inner_struct2 from > functional_orc_def.complextypes_nested_structs; > {code} > it is 80B, although it should not need more memory. > It is not limited to the select list, it should also work with where clauses > etc., for example > {code:java} > select id, outer_struct from functional_orc_def.complextypes_nested_structs > where outer_struct.inner_struct2.i > 1; > {code} > should also have a row size of 64B instead of 68B. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IMPALA-11365) Dereferencing null pointer in TopNNode
Daniel Becker created IMPALA-11365: -- Summary: Dereferencing null pointer in TopNNode Key: IMPALA-11365 URL: https://issues.apache.org/jira/browse/IMPALA-11365 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker In the constructor of TopNNode, if {{pnode.partition_comparator_config_}} is NULL, we initialise {{partition_cmp_}} with a NULL pointer. However, when initialising {{{}partition_heaps_{}}}, we dereference {{partition_cmp_}} because {{ComparatorWrapper}} expects a reference. This has so far not lead to a crash because in this case the comparator of {{partition_heaps_}} is not used, but assigning a NULL pointer to a reference is undefined behaviour. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (IMPALA-11365) Dereferencing null pointer in TopNNode
[ https://issues.apache.org/jira/browse/IMPALA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11365. Resolution: Fixed > Dereferencing null pointer in TopNNode > -- > > Key: IMPALA-11365 > URL: https://issues.apache.org/jira/browse/IMPALA-11365 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In the constructor of TopNNode, if {{pnode.partition_comparator_config_}} is > NULL, we initialise {{partition_cmp_}} with a NULL pointer. However, when > initialising {{{}partition_heaps_{}}}, we dereference {{partition_cmp_}} > because {{ComparatorWrapper}} expects a reference. > This has so far not lead to a crash because in this case the comparator of > {{partition_heaps_}} is not used, but assigning a NULL pointer to a reference > is undefined behaviour. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IMPALA-11410) Codegen crashes instead of reporting corrupt function
Daniel Becker created IMPALA-11410: -- Summary: Codegen crashes instead of reporting corrupt function Key: IMPALA-11410 URL: https://issues.apache.org/jira/browse/IMPALA-11410 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker In {{FragmentState::CodegenHelper}} we call {{plan_tree_->Codegen(this)}} and {{sink_config_->Codegen(this)}} but the status of codegenning is discarded (or only used in the profile). If codegen fails because of a bug and the generated functions fail verification, {{LlvmCodeGen::is_corrupt_}} is set to true, which means all further functions will fail verification too. This can lead to {{LlvmCodeGen::GetHashFunction}} returning {{{}NULL{}}}, but in {{HashTableCtx::CodegenHashRow}} we dereference this {{NULL}} pointer, causing a crash. See [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/exec/hash-table.cc#L1043 (the pointer|https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/exec/hash-table.cc#L1043] (the pointer in question is {{{}hash_fn{}}}). This situation only arises if there is already a bug in code generation, but if the codegen bug is in a {{{}ScalarExpr{}}}, for example {{{}SlotRef{}}}, we return an error message instead of crashing. See {{FragmentState::CodegenHelper}} for how these cases are handled differently. It would help debugging if we handled these cases uniformly, by returning an error message. Steps to reproduce: 1. Introduce an error in {{FilterContext::CodegenEval}} by deleting a {{CreateBr}} call 2. Run the following query: {code:sql} select a.outer_struct.inner_struct2.i, b.small_struct.i from functional_orc_def.complextypes_nested_structs a inner join functional_orc_def.complextypes_structs b on b.small_struct.i = a.outer_struct.inner_struct2.i + 19091 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11412) CodegenFnPtr::store() has a compile time error when instantiated
Daniel Becker created IMPALA-11412: -- Summary: CodegenFnPtr::store() has a compile time error when instantiated Key: IMPALA-11412 URL: https://issues.apache.org/jira/browse/IMPALA-11412 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker The function template {{CodegenFnPtr::store() }}tries to implicitly cast a function pointer (of type '{{{}FuncType{}}}') to {{{}void*{}}}, which is a compile time error. The reason this didn't come up in the builds is that this function template is currently not used anywhere, and the function pointers are stored through the parent class, using {{{}CodegenFnPtrBase::store(){}}}, which takes a {{{}void*{}}}. We should either # remove the hitherto unused {{CodegenFnPtr::store()}} function template OR # add the correct explicit cast from function pointer to {{void*}} AND add a test which instantiates (and tests) this function template so we can be sure that the new implementation is correct. I'm inclined to choose the second option because I think the interface of {{CodegenFnPtr}} is more complete if we have this function as well, even if it is currently not used. Note: After digging a bit on the internet I found that the reason that implicit function pointer to {{void*}} cast is not allowed (as opposed to implicit regular pointer to {{{}void*{}}}) is because the standard doesn't guarantee that regular and function pointers have the same size, and there are some architectures where they actually don't. However, according to 8) on [https://en.cppreference.com/w/cpp/language/reinterpret_cast|https://en.cppreference.com/w/cpp/language/reinterpret_cast], POSIX compliant systems do have this guarantee, so it shouldn't be a problem that we store funcion pointers as {{{}void*{}}}. We don't really have a choice because LLVM does the same as {{llvm::ExecutionEngine::getPointerToFunction()}} returns a {{void*}} (see [https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa|https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa]); we call the function [https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/codegen/llvm-codegen.cc#L1315|here]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children
Daniel Becker created IMPALA-11416: -- Summary: SlotRef::tuple_is_nullable_ uninitialised for struct children Key: IMPALA-11416 URL: https://issues.apache.org/jira/browse/IMPALA-11416 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the {{SlotRef}} is not within a struct: {code:cpp} if (!slot_desc_->parent()->isTupleOfStructSlot()) { tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_); } {code} https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103 Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined behaviour when it is read. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children
[ https://issues.apache.org/jira/browse/IMPALA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11416. Resolution: Fixed > SlotRef::tuple_is_nullable_ uninitialised for struct children > - > > Key: IMPALA-11416 > URL: https://issues.apache.org/jira/browse/IMPALA-11416 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the > {{SlotRef}} is not within a struct: > {code:cpp} > if (!slot_desc_->parent()->isTupleOfStructSlot()) { > tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_); > } > {code} > https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103 > Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined > behaviour when it is read. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (IMPALA-11416) SlotRef::tuple_is_nullable_ uninitialised for struct children
[ https://issues.apache.org/jira/browse/IMPALA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker closed IMPALA-11416. -- > SlotRef::tuple_is_nullable_ uninitialised for struct children > - > > Key: IMPALA-11416 > URL: https://issues.apache.org/jira/browse/IMPALA-11416 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In {{SlotRef::Init}}, {{tuple_is_nullable_}} is only assigned a value if the > {{SlotRef}} is not within a struct: > {code:cpp} > if (!slot_desc_->parent()->isTupleOfStructSlot()) { > tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_); > } > {code} > https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/exprs/slot-ref.cc#L103 > Otherwise {{tuple_is_nullable_}} remains uninitialised, leading to undefined > behaviour when it is read. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11412) CodegenFnPtr::store() has a compile time error when instantiated
[ https://issues.apache.org/jira/browse/IMPALA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11412. Resolution: Fixed > CodegenFnPtr::store() has a compile time error when instantiated > -- > > Key: IMPALA-11412 > URL: https://issues.apache.org/jira/browse/IMPALA-11412 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > The function template {{CodegenFnPtr::store()}} tries to implicitly > cast a function pointer (of type {{FuncType}}) to {{void*}}, which is a > compile time error. The reason this didn't come up in the builds is that this > function template is currently not used anywhere, and the function pointers > are stored through the parent class, using {{{}CodegenFnPtrBase::store(){}}}, > which takes a {{{}void*{}}}. > We should either > # remove the hitherto unused {{CodegenFnPtr::store()}} function > template > OR > # add the correct explicit cast from function pointer to {{void*}} AND add a > test which instantiates (and tests) this function template so we can be sure > that the new implementation is correct. > I'm inclined to choose the second option because I think the interface of > {{CodegenFnPtr}} is more complete if we have this function as well, > even if it is currently not used. > Note: > After digging a bit on the internet I found that the reason that implicit > function pointer to {{void*}} cast is not allowed (as opposed to implicit > regular pointer to {{void*}}) is because the standard doesn't guarantee that > regular and function pointers have the same size, and there are some > architectures where they actually don't. > However, according to 8) on > [https://en.cppreference.com/w/cpp/language/reinterpret_cast], POSIX > compliant systems do have this guarantee, so it shouldn't be a problem that > we store function pointers as {{{}void*{}}}. We don't really have a choice > because LLVM does the same as > {{llvm::ExecutionEngine::getPointerToFunction()}} returns a {{void*}} (see > [https://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html#acc46759a6acfc3d116c3f22110326ffa]); > we call that function > [here|https://github.com/apache/impala/blob/fefb9f24be1f99ac0077a8d6ef00834d8e90ef45/be/src/codegen/llvm-codegen.cc#L1315]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11425) Python TypeError: super() takes at least 1 argument (0 given)
Daniel Becker created IMPALA-11425: -- Summary: Python TypeError: super() takes at least 1 argument (0 given) Key: IMPALA-11425 URL: https://issues.apache.org/jira/browse/IMPALA-11425 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker The following error happens in various builds during tarball creation: {code:java} Traceback (most recent call last): File "setup.py", line 167, in 'Topic :: Database :: Front-Ends' File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/command/sdist.py", line 153, in run self.run_command(cmd_name) File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/usr/lib64/python2.7/distutils/dist.py", line 970, in run_command cmd_obj = self.get_command_obj(command) File "/usr/lib64/python2.7/distutils/dist.py", line 845, in get_command_obj klass = self.get_command_class(command) File "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/dist.py", line 410, in get_command_class return _Distribution.get_command_class(self, command) File "/usr/lib64/python2.7/distutils/dist.py", line 815, in get_command_class __import__ (module_name) File "/usr/lib64/python2.7/distutils/command/check.py", line 13, in from docutils.utils import Reporter File "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", line 123, in release=True # True for official releases and pre-releases File "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", line 93, in __new__ return super().__new__(cls, major, minor, micro, TypeError: super() takes at least 1 argument (0 given) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11427) TestOrcStats.test_orc_stats fails
Daniel Becker created IMPALA-11427: -- Summary: TestOrcStats.test_orc_stats fails Key: IMPALA-11427 URL: https://issues.apache.org/jira/browse/IMPALA-11427 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker In one of the builds, query_test.test_orc_stats.TestOrcStats.test_orc_stats fails: {code:java} query_test/test_orc_stats.py:40: in test_orc_stats self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) common/impala_test_suite.py:820: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:665: in verify_runtime_profile % (function, field, expected_value, actual_value, op, actual)) E AssertionError: Aggregation of SUM over RowsRead did not match expected results. E EXPECTED VALUE: E 5 E E E ACTUAL VALUE: E 0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build
Daniel Becker created IMPALA-11431: -- Summary: TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build Key: IMPALA-11431 URL: https://issues.apache.org/jira/browse/IMPALA-11431 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker In one of the exhaustive builds, query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails: {code:java} query_test/test_nested_types.py:252: in test_compute_stats_with_structs self.run_test_case('QueryTest/compute-stats-with-structs', vector) common/impala_test_suite.py:778: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:588: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:469: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:278: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1 == 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1 E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1 E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1 E 'str','STRING',6,0,11,10.330154,-1,-1 != 'str','STRING',-1,-1,-1,-1,-1,-1 E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11432) TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup
Daniel Becker created IMPALA-11432: -- Summary: TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup Key: IMPALA-11432 URL: https://issues.apache.org/jira/browse/IMPALA-11432 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker In one of the exhaustive builds authorization.test_ranger.TestRanger.test_grant_revoke_with_role failed with one of the impalads stuck during startup: Stacktrace {code:java} common/custom_cluster_test_suite.py:181: in setup_method self._start_impala_cluster(cluster_args, **kwargs) common/custom_cluster_test_suite.py:285: in _start_impala_cluster check_call(cmd + options, close_fds=True) /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/Impala-Toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call raise CalledProcessError(retcode, cmd) E CalledProcessError: Command '['/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=3', '--log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger ', '--state_store_args=None ', '--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger ', '--impalad_args=--default_query_options=']' returned non-zero exit status 1 {code} Standard Error {code:java} -- 2022-07-14 01:07:04,943 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 --log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger ' '--state_store_args=None ' '--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger --use_customized_user_groups_mapper_for_ranger ' --impalad_args=--default_query_options= 01:07:05 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 01:07:05 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO 01:07:05 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 01:07:05 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO 01:07:05 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 01:07:05 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 01:07:08 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:09 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:10 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:11 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:12 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:13 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:14 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:15 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:16 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:17 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:18 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) 01:07:18 MainThread: Error starting cluster Traceback (most recent call last): File "/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py", line 840, in expected_cluster_size - expected_catalog_delays) File "/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tes
[jira] [Created] (IMPALA-11443) Possible overflow in SortNode.java
Daniel Becker created IMPALA-11443: -- Summary: Possible overflow in SortNode.java Key: IMPALA-11443 URL: https://issues.apache.org/jira/browse/IMPALA-11443 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Daniel Becker At line https://github.com/apache/impala/blob/d029ae53676c8637bc4f56b80b331920c8289108/fe/src/main/java/org/apache/impala/planner/SortNode.java#L514 the following precondition check was triggered in ResourceProfileBuilder: {code:java} Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set"); {code} See https://github.com/apache/impala/blob/d029ae53676c8637bc4f56b80b331920c8289108/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java#L79 This may be due to corrupt statistics but further investigation is necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11345) Query failed when creating equal conjunction map for Parquet bloom filter
[ https://issues.apache.org/jira/browse/IMPALA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11345. Resolution: Fixed > Query failed when creating equal conjunction map for Parquet bloom filter > - > > Key: IMPALA-11345 > URL: https://issues.apache.org/jira/browse/IMPALA-11345 > Project: IMPALA > Issue Type: Bug > Components: Backend, Distributed Exec >Affects Versions: Impala 4.1.0 > Environment: CentOS-7, Impala-4.1 >Reporter: Yuchen Fan >Assignee: Daniel Becker >Priority: Critical > > When querying Hive table was added columns without using 'cascade', Impala > will encounter error like "Unable to find SchemaNode for path > 'db.table.column' in the schema of file > 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file > in error log and found that the schema is not compatible with table metadata. > Call stack is attached as below. Path and table name is masked: > {code:java} > I0609 18:04:25.970052 115413 status.cc:129] > c94d0ab3fdf8f943:320300610002] Unable to find SchemaNode for path > 'xxx_db.xxx_table.xxx_column' in the schema of file > 'hdfs://xxx_nn/xxx_table_path/00_0'. > @ 0xea543b impala::Status::Status() > @ 0x1e3225c > impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap() > @ 0x1e363ea impala::HdfsParquetScanner::Open() > @ 0x19b40d0 > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x1b5cbae impala::HdfsScanNode::ProcessSplit() > @ 0x1b5e12a impala::HdfsScanNode::ScannerThread() > @ 0x1b5e9c6 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x18eafa9 impala::Thread::SuperviseThread() > @ 0x18ee11a boost::detail::thread_data<>::run() > @ 0x2385510 thread_proxy > @ 0x7fb5b0745162 start_thread > @ 0x7fb5ad21df6c __clone{code} > The error may be relation with > [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom > filter requires right hand values of equal conjunction matches with current > file schema. The filter will be unavailable if the column does not exist in > all parquet files scanned. I think we can disable parquet bloom filter for > this single query or scan node when discovered such situation. > How to reproduce (using impala-shell): > # create table parquet_test (id INT) stored as parquet; > # insert into parquet_test values (1),(2),(3); > # alter table parquet_test add columns (name STRING); > # insert into parquet_test values (4, "James"); > # select * from parquet_test where name in ("Lily"); > # Error occured. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-10918) Allow map type in SELECT list
[ https://issues.apache.org/jira/browse/IMPALA-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10918. Resolution: Implemented > Allow map type in SELECT list > - > > Key: IMPALA-10918 > URL: https://issues.apache.org/jira/browse/IMPALA-10918 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Daniel Becker >Priority: Major > Labels: complextype > > This covers collections: Map > Expected printout format: > Map: {"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11427) TestOrcStats.test_orc_stats fails
[ https://issues.apache.org/jira/browse/IMPALA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11427. Resolution: Done > TestOrcStats.test_orc_stats fails > - > > Key: IMPALA-11427 > URL: https://issues.apache.org/jira/browse/IMPALA-11427 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > > In one of the builds, query_test.test_orc_stats.TestOrcStats.test_orc_stats > fails: > {code:java} > query_test/test_orc_stats.py:40: in test_orc_stats > self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) > common/impala_test_suite.py:820: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:665: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over RowsRead did not match expected > results. > E EXPECTED VALUE: > E 5 > E > E > E ACTUAL VALUE: > E 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-10753) Incorrect length when multiple CHAR(N) values are inserted
[ https://issues.apache.org/jira/browse/IMPALA-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10753. Resolution: Cannot Reproduce > Incorrect length when multiple CHAR(N) values are inserted > -- > > Key: IMPALA-10753 > URL: https://issues.apache.org/jira/browse/IMPALA-10753 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Csaba Ringhofer >Assignee: Daniel Becker >Priority: Minor > Labels: correctness, ramp-up > > To reproduce: > {code} > CREATE TABLE impala_char_insert (s STRING); > -- all values are CHAR(N) with different N, but all will use the biggest N > INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" > AS CHAR(2))), (CAST("123" AS CHAR(3))); > SELECT length(s) FROM impala_char_insert; > results: > 3 > 3 > 3 > -- inserting the same values in separate INSERTs works correctly > INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))); > INSERT INTO impala_char_insert VALUES (CAST("12" AS CHAR(2))); > INSERT INTO impala_char_insert VALUES (CAST("123" AS CHAR(3))); > SELECT length(s) FROM impala_char_insert; > results: > 1 > 2 > 3 > -- if one value is not CHAR(N), then the lengths are correct > INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" > AS VARCHAR(2))), (CAST("123" AS CHAR(3))); > SELECT length(s) FROM impala_char_insert; > results: > 1 > 2 > 3 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11432) TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup
[ https://issues.apache.org/jira/browse/IMPALA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11432. Resolution: Cannot Reproduce > TestRanger.test_grant_revoke_with_role fails with impalad stuck at startup > -- > > Key: IMPALA-11432 > URL: https://issues.apache.org/jira/browse/IMPALA-11432 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > > In one of the exhaustive builds > authorization.test_ranger.TestRanger.test_grant_revoke_with_role failed with > one of the impalads stuck during startup: > Stacktrace > {code:java} > common/custom_cluster_test_suite.py:181: in setup_method > self._start_impala_cluster(cluster_args, **kwargs) > common/custom_cluster_test_suite.py:285: in _start_impala_cluster > check_call(cmd + options, close_fds=True) > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/Impala-Toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py:190: > in check_call > raise CalledProcessError(retcode, cmd) > E CalledProcessError: Command > '['/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py', > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', > '--num_coordinators=3', > '--log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests', > '--log_level=1', '--impalad_args=--server-name=server1 > --ranger_service_type=hive --ranger_app_id=impala > --authorization_provider=ranger > --use_customized_user_groups_mapper_for_ranger ', '--state_store_args=None ', > '--catalogd_args=--server-name=server1 --ranger_service_type=hive > --ranger_app_id=impala --authorization_provider=ranger > --use_customized_user_groups_mapper_for_ranger ', > '--impalad_args=--default_query_options=']' returned non-zero exit status 1 > {code} > Standard Error > {code:java} > -- 2022-07-14 01:07:04,943 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--server-name=server1 > --ranger_service_type=hive --ranger_app_id=impala > --authorization_provider=ranger > --use_customized_user_groups_mapper_for_ranger ' '--state_store_args=None ' > '--catalogd_args=--server-name=server1 --ranger_service_type=hive > --ranger_app_id=impala --authorization_provider=ranger > --use_customized_user_groups_mapper_for_ranger ' > --impalad_args=--default_query_options= > 01:07:05 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 01:07:05 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 01:07:05 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 01:07:05 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 01:07:05 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 01:07:05 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 01:07:08 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:09 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:10 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:11 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:12 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:13 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:14 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:15 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:16 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) > 01:07:17 MainThread: Found 2 impalad/1 statestored/1 catalogd process(e
[jira] [Resolved] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11431. Resolution: Cannot Reproduce > TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an > exhaustive build > > > Key: IMPALA-11431 > URL: https://issues.apache.org/jira/browse/IMPALA-11431 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > > In one of the exhaustive builds, > query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs > fails: > {code:java} > query_test/test_nested_types.py:252: in test_compute_stats_with_structs > self.run_test_case('QueryTest/compute-stats-with-structs', vector) > common/impala_test_suite.py:778: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:588: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:469: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E > 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1 > == > 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1 > E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1 > E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == > 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1 > E 'str','STRING',6,0,11,10.330154,-1,-1 != > 'str','STRING',-1,-1,-1,-1,-1,-1 > E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == > 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11425) Python TypeError: super() takes at least 1 argument (0 given)
[ https://issues.apache.org/jira/browse/IMPALA-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11425. Resolution: Duplicate > Python TypeError: super() takes at least 1 argument (0 given) > - > > Key: IMPALA-11425 > URL: https://issues.apache.org/jira/browse/IMPALA-11425 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > > The following error happens in various builds during tarball creation: > {code:java} > Traceback (most recent call last): > File "setup.py", line 167, in > 'Topic :: Database :: Front-Ends' > File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup > dist.run_commands() > File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands > self.run_command(cmd) > File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command > cmd_obj.run() > File > "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/command/sdist.py", > line 153, in run > self.run_command(cmd_name) > File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command > self.distribution.run_command(command) > File "/usr/lib64/python2.7/distutils/dist.py", line 970, in run_command > cmd_obj = self.get_command_obj(command) > File "/usr/lib64/python2.7/distutils/dist.py", line 845, in get_command_obj > klass = self.get_command_class(command) > File > "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/setuptools/dist.py", line > 410, in get_command_class > return _Distribution.get_command_class(self, command) > File "/usr/lib64/python2.7/distutils/dist.py", line 815, in > get_command_class > __import__ (module_name) > File "/usr/lib64/python2.7/distutils/command/check.py", line 13, in > from docutils.utils import Reporter > File > "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", > line 123, in > release=True # True for official releases and pre-releases > File > "/tmp/impala-venv-huM4f/lib/python2.7/site-packages/docutils/__init__.py", > line 93, in __new__ > return super().__new__(cls, major, minor, micro, > TypeError: super() takes at least 1 argument (0 given) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11643) Implement ColumnType::ToIR() for non-scalar types
Daniel Becker created IMPALA-11643: -- Summary: Implement ColumnType::ToIR() for non-scalar types Key: IMPALA-11643 URL: https://issues.apache.org/jira/browse/IMPALA-11643 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker Currently ColumnType::ToIR() is only implemented for scalar types. It should be extended to support all types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11645) Remove PrintThriftEnum functions in debug-utils.cc
Daniel Becker created IMPALA-11645: -- Summary: Remove PrintThriftEnum functions in debug-utils.cc Key: IMPALA-11645 URL: https://issues.apache.org/jira/browse/IMPALA-11645 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Daniel Becker Before IMPALA-5690 we implemented operator<< for Thrift enums in Impala code. These functions printed the names of the enums. Then we upgraded to Thrift 0.9.3, but that release included THRIFT-2067, which implemented operator<< for Thrift enums, but printed the number value of enums instead of their names. To preserve the old behaviour in Impala, we renamed our own implementations of operator<< to PrintThriftEnum, a function that we defined for each Thrift enum we used, and which returned a string with the names - not the numbers - of the enums. After upgrading Thrift to a version that included THRIFT-3921 (any version starting from 0.11.0), these PrintThriftEnum functions are no longer necessary as the operator<< provided by Thrift now prints the names of enums, which is the behaviour we want. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-10356) Analyzed query in explain plan is not quite right for insert with values clause
[ https://issues.apache.org/jira/browse/IMPALA-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-10356. Resolution: Fixed > Analyzed query in explain plan is not quite right for insert with values > clause > --- > > Key: IMPALA-10356 > URL: https://issues.apache.org/jira/browse/IMPALA-10356 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0.0 >Reporter: Tim Armstrong >Assignee: Daniel Becker >Priority: Major > Labels: newbie, ramp-up > > In impala-shell: > {noformat} > create table double_tbl (d double) stored as textfile; > set explain_level=2; > explain insert into double_tbl values (-0.43149576573887316); > {noformat} > {noformat} > +--+ > | Explain String > | > +--+ > | Max Per-Host Resource Reservation: Memory=0B Threads=1 > | > | Per-Host Resource Estimates: Memory=10MB > | > | Codegen disabled by planner > | > | Analyzed query: SELECT CAST(-0.43149576573887316 AS DECIMAL(17,17)) UNION > SELECT | > | CAST(-0.43149576573887316 AS DECIMAL(17,17)) > | > | > | > | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | | Per-Host Resources: mem-estimate=8B mem-reservation=0B > thread-reservation=1 | > | WRITE TO HDFS [default.double_tbl, OVERWRITE=false] > | > | | partitions=1 > | > | | output exprs: CAST(-0.43149576573887316 AS DOUBLE) > | > | | mem-estimate=8B mem-reservation=0B thread-reservation=0 > | > | | > | > | 00:UNION > | > |constant-operands=1 > | > |mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > |tuple-ids=0 row-size=8B cardinality=1 > | > |in pipelines: > | > +--+ > {noformat} > The analyzed query does not make sense. We should investigate and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11623) Put *-ir.cc files into their own libraries to avoid extra recompilation
[ https://issues.apache.org/jira/browse/IMPALA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11623. Resolution: Implemented > Put *-ir.cc files into their own libraries to avoid extra recompilation > --- > > Key: IMPALA-11623 > URL: https://issues.apache.org/jira/browse/IMPALA-11623 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.2.0 >Reporter: Joe McDonnell >Assignee: Daniel Becker >Priority: Major > > It is desirable to be able to iterate quickly by running "make -j impalad" > while modifying a file. Currently, modifying most files incurs a rebuild of > the LLVM IR, which is a slow serial step. For example: > > {noformat} > $ touch be/src/runtime/coordinator.cc > $ make -j impalad > ... > [ 98%] Generating ../../../llvm-ir/impala.bc > [ 98%] Generating ../../../llvm-ir/impala-legacy-avx.bc > [ 98%] Generating ../../generated-sources/impala-ir/impala-ir.cc > [ 98%] Generating ../../generated-sources/impala-ir/impala-ir-legacy-avx.cc > ...{noformat} > This can add several seconds to an incremental build. This step happens for > files that do not actually impact the LLVM IR, so there are ways to avoid > this. > The reason that LLVM IR is rebuilt is because it has a dependencies on Exec, > Exprs, Runtime, Udf, Util, and other libraries: > > {noformat} > add_custom_command( > OUTPUT ${IR_OUTPUT_FILE} > COMMAND ${LLVM_CLANG_EXECUTABLE} ${CLANG_IR_CXX_FLAGS} > ${PLATFORM_SPECIFIC_FLAGS} > ${CLANG_INCLUDE_FLAGS} ${IR_INPUT_FILES} -o ${IR_TMP_OUTPUT_FILE} > COMMAND ${LLVM_OPT_EXECUTABLE} ${LLVM_OPT_IR_FLAGS} < ${IR_TMP_OUTPUT_FILE} > > ${IR_OUTPUT_FILE} > COMMAND rm ${IR_TMP_OUTPUT_FILE} > DEPENDS Exec ExecAvro ExecKudu Exprs Runtime Udf Util ${IR_INPUT_FILES} > ){noformat} > From a correctness perspective, the LLVM IR only cares about things that > impact the content of the *-ir.cc files, because impala-ir.cc includes every > *-ir.cc file. That list of libraries is a superset of what is needed. > If the *-ir.cc files were split off into their own libraries (i.e. ExecIr > rather than Exec), then this target would only depend on the ExecIr rather > than the larger Exec. This would reduce the number of files that would cause > LLVM IR to be rebuilt. That should reduce the runtime of an incremental "make > -j impalad" for quite a few C++ files. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11581) ALTER TABLE RENAME TO doesn't update transient_lastDdlTime
[ https://issues.apache.org/jira/browse/IMPALA-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11581. Resolution: Fixed > ALTER TABLE RENAME TO doesn't update transient_lastDdlTime > -- > > Key: IMPALA-11581 > URL: https://issues.apache.org/jira/browse/IMPALA-11581 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Daniel Becker >Priority: Major > Labels: ramp-up > > ALTER TABLE RENAME TO doesn't update transient_lastDdlTime. > The following statements behave differently when executed via Hive or Impala: > {noformat} > CREATE TABLE rename_from (i int); > ALTER TABLE rename_from RENAME TO rename_to; > {noformat} > During ALTER TABLE ... RENAME TO ... Hive updates transient_lastDdlTime while > Impala leaves it unchanged. > Impala should follow Hive's behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11645) Remove PrintThriftEnum functions in debug-utils.cc
[ https://issues.apache.org/jira/browse/IMPALA-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11645. Resolution: Implemented > Remove PrintThriftEnum functions in debug-utils.cc > -- > > Key: IMPALA-11645 > URL: https://issues.apache.org/jira/browse/IMPALA-11645 > Project: IMPALA > Issue Type: Improvement >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > Before IMPALA-5690 we implemented operator<< for Thrift enums in Impala code. > These functions printed the names of the enums. > Then we upgraded to Thrift 0.9.3, but that release included THRIFT-2067, > which implemented operator<< for Thrift enums, but printed the number value > of enums instead of their names. To preserve the old behaviour in Impala, we > renamed our own implementations of operator<< to PrintThriftEnum, a function > that we defined for each Thrift enum we used, and which returned a string > with the names - not the numbers - of the enums. > After upgrading Thrift to a version that included THRIFT-3921 (any version > starting from 0.11.0), these PrintThriftEnum functions are no longer > necessary as the operator<< provided by Thrift now prints the names of enums, > which is the behaviour we want. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11462) shiftleft problem
[ https://issues.apache.org/jira/browse/IMPALA-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11462. Resolution: Fixed > shiftleft problem > - > > Key: IMPALA-11462 > URL: https://issues.apache.org/jira/browse/IMPALA-11462 > Project: IMPALA > Issue Type: Bug > Components: Clients >Affects Versions: Impala 3.4.1 >Reporter: jack sun >Assignee: Daniel Becker >Priority: Minor > Attachments: screenshot-1.png > > > if change the second param of function 'shiftleft' as a dynamic value , it > will change the first param as tinnyint > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11685) Slot memory sharing between struct and field not working if the field is also a struct
Daniel Becker created IMPALA-11685: -- Summary: Slot memory sharing between struct and field not working if the field is also a struct Key: IMPALA-11685 URL: https://issues.apache.org/jira/browse/IMPALA-11685 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Daniel Becker Assignee: Daniel Becker IMPALA-10838 introduced that if a struct and one of its fields are both present in the select list, no extra slot is generated in the row for the struct field but the memory of the struct is reused, i.e. the row size is the same as when only the struct is queried. It works when the struct field is a primitive type: {code:java} explain select id, outer_struct from functional_orc_def.complextypes_nested_structs; row-size=64B{code} {code:java} explain select id, outer_struct, outer_struct.str from functional_orc_def.complextypes_nested_structs; row-size=64B{code} However, it does not if the child is itself a struct: {code:java} explain select id, outer_struct, outer_struct.inner_struct3 from functional_orc_def.complextypes_nested_structs; row-size=80B{code} This is because struct slot descriptors are registered before others so that it is easier to reuse the slot memory of the struct fields, but struct slot descriptors among themselves are sorted in the wrong order (see [https://github.com/apache/impala/blob/c12ac6c27b2df1eae693b44c157d65499f491d21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L340).] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11687) Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails
Daniel Becker created IMPALA-11687: -- Summary: Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails Key: IMPALA-11687 URL: https://issues.apache.org/jira/browse/IMPALA-11687 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Daniel Becker Assignee: Daniel Becker If EXPAND_COMPLEX_TYPES is set to true, some queries that combine star expressions and explicitly given complex columns fail: {code:java} select outer_struct, * from functional_orc_def.complextypes_nested_structs; ERROR: IllegalStateException: Illegal reference to non-materialized slot: tid=1 sid=1{code} {code:java} select *, outer_struct.str from functional_orc_def.complextypes_nested_structs; ERROR: IllegalStateException: null{code} Having two stars in a table with complex columns also fails. {code:java} select *, * from functional_orc_def.complextypes_nested_structs;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11685) Slot memory sharing between struct and field not working if the field is also a struct
[ https://issues.apache.org/jira/browse/IMPALA-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11685. Resolution: Fixed > Slot memory sharing between struct and field not working if the field is also > a struct > -- > > Key: IMPALA-11685 > URL: https://issues.apache.org/jira/browse/IMPALA-11685 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > IMPALA-10838 introduced that if a struct and one of its fields are both > present in the select list, no extra slot is generated in the row for the > struct field but the memory of the struct is reused, i.e. the row size is the > same as when only the struct is queried. It works when the struct field is a > primitive type: > {code:java} > explain select id, outer_struct from > functional_orc_def.complextypes_nested_structs; > row-size=64B{code} > {code:java} > explain select id, outer_struct, outer_struct.str from > functional_orc_def.complextypes_nested_structs; > row-size=64B{code} > However, it does not if the child is itself a struct: > {code:java} > explain select id, outer_struct, outer_struct.inner_struct3 from > functional_orc_def.complextypes_nested_structs; > row-size=80B{code} > This is because struct slot descriptors are registered before others so that > it is easier to reuse the slot memory of the struct fields, but struct slot > descriptors among themselves are sorted in the wrong order (see > [https://github.com/apache/impala/blob/c12ac6c27b2df1eae693b44c157d65499f491d21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L340).] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11692) Struct slot memory sharing involving select * not working properly
Daniel Becker created IMPALA-11692: -- Summary: Struct slot memory sharing involving select * not working properly Key: IMPALA-11692 URL: https://issues.apache.org/jira/browse/IMPALA-11692 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker With EXPAND_COMPLEX_TYPES=1, if there are structs coming from the star expansion and members of the structs are also given explicitly, slot memory sharing does not work in some cases: {code:java} explain select * from functional_orc_def.complextypes_nested_structs; row-size=64B{code} {code:java} explain select *, outer_struct.inner_struct1 from functional_orc_def.complextypes_nested_structs; row-size=80B{code} The row size should be the same in both cases as outer_struct.inner_struct1 is part of outer_struct which is included in the star. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11687) Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails
[ https://issues.apache.org/jira/browse/IMPALA-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11687. Resolution: Fixed > Select * with EXPAND_COMPLEX_TYPES=1 and explicit complex types fails > - > > Key: IMPALA-11687 > URL: https://issues.apache.org/jira/browse/IMPALA-11687 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > If EXPAND_COMPLEX_TYPES is set to true, some queries that combine star > expressions and explicitly given complex columns fail: > {code:java} > select outer_struct, * from functional_orc_def.complextypes_nested_structs; > ERROR: IllegalStateException: Illegal reference to non-materialized slot: > tid=1 sid=1{code} > {code:java} > select *, outer_struct.str from > functional_orc_def.complextypes_nested_structs; > ERROR: IllegalStateException: null{code} > Having two stars in a table with complex columns also fails. > {code:java} > select *, * from functional_orc_def.complextypes_nested_structs; > ERROR: IllegalStateException: Illegal reference to non-materialized slot: > tid=6 sid=13{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11712) Sort out column masking with complex types
Daniel Becker created IMPALA-11712: -- Summary: Sort out column masking with complex types Key: IMPALA-11712 URL: https://issues.apache.org/jira/browse/IMPALA-11712 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Daniel Becker We determine whether a SlotDescriptor created from a star expanded path should be registered for column masking based on the path of the star item: ??Empty matched types means this is expanded from star of a catalog table.?? ??For star of complex types, e.g. my_struct.*, my_array.*, my_map.*, the matched?? ??types will have the complex type so it's not empty.?? [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L659] However, this comment may be wrong because in the query {code:java} select a.* from mix_struct_array t, t.struct_in_arr a;{code} {{getMatchedTypes()}} returns an empty list for the star path even though it is not from a catalog table. We should also find out whether we can determine from the expanded path alone (and not the path of the star item) whether we need to register it for column masking, for example by checking if it is within a complex type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11692) Struct slot memory sharing involving select * not working properly
[ https://issues.apache.org/jira/browse/IMPALA-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11692. Resolution: Fixed > Struct slot memory sharing involving select * not working properly > --- > > Key: IMPALA-11692 > URL: https://issues.apache.org/jira/browse/IMPALA-11692 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > With EXPAND_COMPLEX_TYPES=1, if there are structs coming from the star > expansion and members of the structs are also given explicitly, slot memory > sharing does not work in some cases: > {code:java} > explain select * from functional_orc_def.complextypes_nested_structs; > row-size=64B{code} > {code:java} > explain select *, outer_struct.inner_struct1 from > functional_orc_def.complextypes_nested_structs; > row-size=80B{code} > The row size should be the same in both cases as outer_struct.inner_struct1 > is part of outer_struct which is included in the star. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11717) Use rapidjson for printing collections
Daniel Becker created IMPALA-11717: -- Summary: Use rapidjson for printing collections Key: IMPALA-11717 URL: https://issues.apache.org/jira/browse/IMPALA-11717 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker We use rapidjson to print structs but don't use it to print collections (arrays and maps). We should switch to rapidjson also for collections to have a uniform approach. This is also needed if we want to support embedding structs and collections in each other, see [IMPALA-9551|https://issues.apache.org/jira/browse/IMPALA-9551]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11719) Inconsistency in printing NULL values
Daniel Becker created IMPALA-11719: -- Summary: Inconsistency in printing NULL values Key: IMPALA-11719 URL: https://issues.apache.org/jira/browse/IMPALA-11719 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker If they are top level or in collections, null values are printed as "NULL": {code:java} select int_array from functional_parquet.complextypestbl; ++ | int_array | ++ | [-1] | | [1,2,3] | | [NULL,1,2,NULL,3,NULL] | | [] | | NULL | | NULL | | NULL | | NULL | ++{code} If they are in a struct, they are printed as "null": {code:java} select small_struct from functional_parquet.complextypes_structs; ++ | small_struct | ++ | NULL | | {"i":19191,"s":"small_struct_str"} | | {"i":98765,"s":null} | | {"i":null,"s":"str"} | | {"i":98765,"s":"abcde f"} | | {"i":null,"s":null} | ++{code} In Hive the situation is a bit different: "NULL" is used only for top level values and "null" is printed in both collections and structs. {code:java} select int_array from functional_parquet.complextypestbl; +-+ | int_array | +-+ | [-1] | | [1,2,3] | | [null,1,2,null,3,null] | | [] | | NULL | | NULL | | NULL | | NULL | +-+{code} {code:java} select small_struct from functional_parquet.complextypes_structs; +-+ | small_struct | +-+ | NULL | | {"i":19191,"s":"small_struct_str"} | | {"i":98765,"s":null} | | {"i":null,"s":"str"} | | {"i":98765,"s":"abcde f"} | | {"i":null,"s":null} | +-+{code} In JSON the relevant keyword is "null". We should decide how we handle this situation. # Have a uniform NULL representation everywhere: top level, collections and structs ** either "NULL" or "null" everywhere # Have "NULL" on the top level and "null" in collections and structs, like Hive # Leave everything as it is now: "NULL" at the top level and in collections, "null" in structs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11722) Wrong error message when unsupported complex type comes from * expression
Daniel Becker created IMPALA-11722: -- Summary: Wrong error message when unsupported complex type comes from * expression Key: IMPALA-11722 URL: https://issues.apache.org/jira/browse/IMPALA-11722 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Daniel Becker The following query fails with a NullPointerException: {code:java} select * from functional_orc_def.complextypestbl; ERROR: NullPointerException: null {code} The table contains a struct, {{{}nested_struct{}}}, which is not supported yet because it contains collections. If the columns are listed explicitly, the error message is the correct one: {code:java} select id, int_array, int_array_array, int_map, int_map_array, nested_struct from functional_orc_def.complextypestbl; ERROR: AnalysisException: Struct containing a collection type is not allowed in the select list.{code} The same error message should be returned in the select * case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11719) Inconsistency in printing NULL values
[ https://issues.apache.org/jira/browse/IMPALA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-11719. Resolution: Fixed > Inconsistency in printing NULL values > - > > Key: IMPALA-11719 > URL: https://issues.apache.org/jira/browse/IMPALA-11719 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > If they are top level or in collections, null values are printed as "NULL": > {code:java} > select int_array from functional_parquet.complextypestbl; > ++ > | int_array | > ++ > | [-1] | > | [1,2,3] | > | [NULL,1,2,NULL,3,NULL] | > | [] | > | NULL | > | NULL | > | NULL | > | NULL | > ++{code} > If they are in a struct, they are printed as "null": > {code:java} > select small_struct from functional_parquet.complextypes_structs; > ++ > | small_struct | > ++ > | NULL | > | {"i":19191,"s":"small_struct_str"} | > | {"i":98765,"s":null} | > | {"i":null,"s":"str"} | > | {"i":98765,"s":"abcde f"} | > | {"i":null,"s":null} | > ++{code} > In Hive the situation is a bit different: "NULL" is used only for top level > values and "null" is printed in both collections and structs. > {code:java} > select int_array from functional_parquet.complextypestbl; > +-+ > | int_array | > +-+ > | [-1] | > | [1,2,3] | > | [null,1,2,null,3,null] | > | [] | > | NULL | > | NULL | > | NULL | > | NULL | > +-+{code} > {code:java} > select small_struct from functional_parquet.complextypes_structs; > +-+ > | small_struct | > +-+ > | NULL | > | {"i":19191,"s":"small_struct_str"} | > | {"i":98765,"s":null} | > | {"i":null,"s":"str"} | > | {"i":98765,"s":"abcde f"} | > | {"i":null,"s":null} | > +-+{code} > Officially we print collections and structs in JSON form. In JSON the > relevant keyword is "null". > We should decide how we handle this situation. > # Have a uniform NULL representation everywhere: top level, collections and > structs > ** either "NULL" or "null" everywhere > # Have "NULL" on the top level and "null" in collections and structs, like > Hive > # Leave everything as it is now: "NULL" at the top level and in collections, > "null" in structs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-11734) TestIcebergTable.test_compute_stats fails in RELEASE builds
Daniel Becker created IMPALA-11734: -- Summary: TestIcebergTable.test_compute_stats fails in RELEASE builds Key: IMPALA-11734 URL: https://issues.apache.org/jira/browse/IMPALA-11734 Project: IMPALA Issue Type: Improvement Reporter: Daniel Becker Assignee: Daniel Becker If the Impala version is set to a release build as described in point 8 in the "How to Release" document ([https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate),] TestIcebergTable.test_compute_stats fails: h3. Stacktrace {code:java} query_test/test_iceberg.py:852: in test_compute_stats self.run_test_case('QueryTest/iceberg-compute-stats', vector, unique_database) common/impala_test_suite.py:742: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:578: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:469: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:278: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E 2,1,'2.33KB','NOT CACHED','NOT CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes' != 2,1,'2.32KB','NOT CACHED','NOT CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'{code} The problem is the file size which is 2.32KB instead of 2.33KB. This is because the version is written into the file, and "x.y.z-RELEASE" is one byte shorter than "x.y.z-SNAPSHOT". The size of the file in this test is on the boundary between 2.32KB and 2.33KB, so this one byte can change the value. We could use a row_regex to accept both values so it works for both snapshot and release versions. -- This message was sent by Atlassian Jira (v8.20.10#820010)