[jira] [Resolved] (IMPALA-955) Implement the BYTES built-in
[ https://issues.apache.org/jira/browse/IMPALA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Yogi Lodha resolved IMPALA-955. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed Resolved > Implement the BYTES built-in > > > Key: IMPALA-955 > URL: https://issues.apache.org/jira/browse/IMPALA-955 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 1.3 >Reporter: David Z. Chen >Assignee: Pranav Yogi Lodha >Priority: Minor > Labels: built-in-function, newbie, ramp-up > Fix For: Impala 4.1.0 > > > Implement the BYTES built-in: > http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Attribute_Functions.089.02.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-955) Implement the BYTES built-in
[ https://issues.apache.org/jira/browse/IMPALA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490706#comment-17490706 ] ASF subversion and git services commented on IMPALA-955: Commit bde995483a1b6e91dc5d089dfc07225a93d7c8ca in impala's branch refs/heads/master from pranav.lodha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bde9954 ] IMPALA-955: BYTES built-in function The Bytes function returns the number of bytes contained in the specified byte string. There are changes in 4 files. A few testcases are also added in be/src/exprs/expr-test.cc and an end-to end test in testdata/workloads/functional-query/queries/QueryTest/exprs.test. Change-Id: I0bd06c3d6dba354d71f63c649eaa8f9f74d266ee Reviewed-on: http://gerrit.cloudera.org:8080/18210 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Implement the BYTES built-in > > > Key: IMPALA-955 > URL: https://issues.apache.org/jira/browse/IMPALA-955 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 1.3 >Reporter: David Z. Chen >Assignee: Pranav Yogi Lodha >Priority: Minor > Labels: built-in-function, newbie, ramp-up > > Implement the BYTES built-in: > http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Attribute_Functions.089.02.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11097) Execute sometimes fails in call to Hive in test framework
[ https://issues.apache.org/jira/browse/IMPALA-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490705#comment-17490705 ] ASF subversion and git services commented on IMPALA-11097: -- Commit 677d4f91a30e6f12d99b2422514c50d0bb7c799f in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=677d4f9 ] IMPALA-11097: In test framework, call HS2 execute synchronously Changed the HS2 call to be synchronous. The previous code had a race condition because wait_to_finish needs to be called before checking the result set for Hive. Calling execute synchronously for HS2 ensures that the result set is ready. Change-Id: I5ab4b90ba2e1a439119d37fe9fb9c55eeeb53ba0 Reviewed-on: http://gerrit.cloudera.org:8080/18133 Reviewed-by: Csaba Ringhofer Tested-by: Csaba Ringhofer > Execute sometimes fails in call to Hive in test framework > - > > Key: IMPALA-11097 > URL: https://issues.apache.org/jira/browse/IMPALA-11097 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Steve Carlin >Priority: Major > > Hive can fail if you call fetch before the execute succeeds. We should call > wait_to_finish before doing any fetch results. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11072) TestSpillingDebugActionDimensions.test_spilling is flaky
[ https://issues.apache.org/jira/browse/IMPALA-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490588#comment-17490588 ] Riza Suminto commented on IMPALA-11072: --- Hi [~stigahuang] , I've seen some flakiness in downstream build for this exact testcase. There seems to be inconsistent number of fragments assigned to each impalad due to different parquet file count/size being created on each run. I think it is better to investigate in separate JIRA. > TestSpillingDebugActionDimensions.test_spilling is flaky > > > Key: IMPALA-11072 > URL: https://issues.apache.org/jira/browse/IMPALA-11072 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.1.0 > > > We have seen some failure of TestSpillingDebugActionDimensions.test_spilling > in GVO jenkins job and downstream nightly tests. Latest one happen in > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15503/] > > > {code:java} > query_test/test_spilling.py:75: in test_spilling > self.run_test_case('QueryTest/spilling', vector) > common/impala_test_suite.py:743: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:636: in verify_runtime_profile > actual)) > E AssertionError: Did not find matches for lines in runtime profile: > E EXPECTED LINES: > E row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\) > E > E ACTUAL PROFILE: > E Query (id=8b433ac02c4d3fd2:3c50b7c4): > E - InactiveTotalTime: 0.000ns > E - TotalTime: 0.000ns > E Summary: > E Session ID: 9448bded8acf05c6:428a7a797f6b9483 > E Session Type: BEESWAX > E Start Time: 2022-01-08 09:37:07.647285000 > E End Time: 2022-01-08 09:37:15.514936000 > E Query Type: QUERY > E Query State: FINISHED > E Impala Query State: FINISHED > E Query Status: OK > E Impala Version: impalad version 4.1.0-SNAPSHOT RELEASE (build > 560ff976d3a08920a08b4ce3325a1dd9dbe81765) > E User: ubuntu > E Connected User: ubuntu > E Delegated User: > E Network Address: :::127.0.0.1:44648 > E Default Db: tpch_parquet > E Sql Statement: select count(l1.l_tax) > E from > E lineitem l1, > E lineitem l2, > E lineitem l3 > E where > E l1.l_tax < 0.01 and > E l2.l_tax < 0.04 and > E l1.l_orderkey = l2.l_orderkey and > E l1.l_orderkey = l3.l_orderkey and > E l1.l_comment = l3.l_comment and > E l1.l_shipdate = l3.l_shipdate > E Coordinator: ip-172-31-21-231:27000 > E Query Options (set by configuration): > BUFFER_POOL_LIMIT=225443840,MT_DOP=0,DEFAULT_SPILLABLE_BUFFER_SIZE=262144,TIMEZONE=Universal,CLIENT_IDENTIFIER=query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:beeswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none] > E Query Options (set by configuration and planner): > BUFFER_POOL_LIMIT=225443840,MT_DOP=0,DEFAULT_SPILLABLE_BUFFER_SIZE=262144,TIMEZONE=Universal,CLIENT_IDENTIFIER=query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:beeswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none],MINMAX_FILTER_THRESHOLD=0.5,MINMAX_FILTERING_LEVEL=PAGE > > ...{code} > > We should lower the configured BUFFER_POOL_LIMIT for this test to less than > 215MB so that it spill more consistently. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11072) TestSpillingDebugActionDimensions.test_spilling is flaky
[ https://issues.apache.org/jira/browse/IMPALA-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490585#comment-17490585 ] Quanlong Huang commented on IMPALA-11072: - Saw this again in an unrelated change: [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5248/] {code:java} query_test/test_spilling.py:75: in test_spilling self.run_test_case('QueryTest/spilling', vector) common/impala_test_suite.py:743: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:636: in verify_runtime_profile actual)) E AssertionError: Did not find matches for lines in runtime profile: E EXPECTED LINES: E row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\) E E ACTUAL PROFILE: E Query (id=6d47a6323a1d674b:133714c1): E DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance. E - InactiveTotalTime: 0.000ns E - TotalTime: 0.000ns E Summary: E Session ID: d446ae1fd4c1316d:f637b568190aa0ba E Session Type: BEESWAX E Start Time: 2022-02-10 14:08:19.382722000 E End Time: 2022-02-10 14:08:39.128335000 E Query Type: QUERY E Query State: FINISHED E Impala Query State: FINISHED E Query Status: OK E Impala Version: impalad version 4.1.0-SNAPSHOT DEBUG (build 4e3271faf44433c5d3f847a0f965ab4ef1b1a48d) E User: ubuntu E Connected User: ubuntu E Delegated User: E Network Address: 172.18.0.1:41204 E Default Db: tpch_parquet E Sql Statement: SELECT straight_join o_orderkey E FROM ( E SELECT * E FROM orders E JOIN customer ON o_custkey = c_custkey E JOIN nation ON c_nationkey = n_nationkey E JOIN region ON n_regionkey = r_regionkey E WHERE o_orderkey < 50) o1 E LEFT ANTI JOIN /*+broadcast*/ ( E SELECT * E FROM orders E JOIN customer ON o_custkey = c_custkey E JOIN nation ON c_nationkey = n_nationkey E JOIN region ON n_regionkey = r_regionkey E WHERE o_orderkey < 50) o2 ON o1.o_orderkey = o2.o_orderkey E AND o1.o_custkey = o2.o_custkey E AND o1.o_orderstatus = o2.o_orderstatus E AND o1.o_totalprice = o2.o_totalprice E AND o1.o_orderdate = o2.o_orderdate E AND o1.o_orderpriority = o2.o_orderpriority E AND o1.o_clerk = o2.o_clerk E AND o1.o_shippriority = o2.o_shippriority E AND o1.o_comment = o2.o_comment E AND o1.c_custkey = o2.c_custkey E AND o1.c_name = o2.c_name E AND o1.c_address = o2.c_address E AND o1.c_nationkey = o2.c_nationkey E AND o1.c_phone = o2.c_phone E AND o1.c_acctbal = o2.c_acctbal E AND o1.c_mktsegment = o2.c_mktsegment E AND o1.n_nationkey = o2.n_nationkey E AND o1.n_name = o2.n_name E AND o1.n_regionkey = o2.n_regionkey E AND o1.n_comment = o2.n_comment E AND o1.r_name = o2.r_name E AND o1.r_comment = o2.r_comment E AND fnv_hash(o1.n_name) = fnv_hash(o2.n_name) E AND fnv_hash(o1.r_name) = fnv_hash(o2.r_name) E AND fnv_hash(o1.o_orderstatus) = fnv_hash(o2.o_orderstatus) E AND fnv_hash(o1.o_shippriority) = fnv_hash(o2.o_shippriority) E AND fnv_hash(o1.o_orderdate) = fnv_hash(o2.o_orderdate) E AND fnv_hash(o1.o_orderpriority) = fnv_hash(o2.o_orderpriority) E AND fnv_hash(o1.o_clerk) = fnv_hash(o2.o_clerk) E ORDER BY o_orderkey E Coordinator: 172.18.0.4:27000 E Query Options (set by configuration): BUFFER_POOL_LIMIT=115343360,RUNTIME_FILTER_MODE=OFF,MT_DOP=0,DEFAULT_SPILLABLE_BUFFER_SIZE=262144,TIMEZONE=UTC,CLIENT_IDENTIFIER=query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:beeswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none] E Query Options (set by configuration and planner): BUFFER_POOL_LIMIT=115343360,RUNTIME_FILTER_MODE=OFF,MT_DOP=0,DEFAULT_SPILLABLE_BUFFER_SIZE=262144,TIMEZONE=UTC,CLIENT_IDENTIFIER=query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:beeswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none] E Plan: ...{code} This is another query. We probably need to set another BUFFER_POOL_LIMIT for it. Should we reopen this Jira or create another one? > TestSpillingDebugActionDimensions.test_spilling is flaky > > > Key: IMPALA-11072 > URL: https://issues.apache.org/jira/browse/IMPALA-11072 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0.0 >Reporter: Riza Suminto >
[jira] [Assigned] (IMPALA-10948) Impala shouldn't require DECIMAL scale for Parquet files
[ https://issues.apache.org/jira/browse/IMPALA-10948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Fürnstáhl reassigned IMPALA-10948: -- Assignee: Gergely Fürnstáhl > Impala shouldn't require DECIMAL scale for Parquet files > > > Key: IMPALA-10948 > URL: https://issues.apache.org/jira/browse/IMPALA-10948 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Gergely Fürnstáhl >Priority: Major > Labels: ramp-up > > Impala requires the 'scale' to be set for decimal columns: > https://github.com/apache/impala/blob/1a61a8025c87c37921a1bba4c49f754d8bd10bcc/be/src/exec/parquet/parquet-metadata-utils.cc#L332 > But it is only an optional field in Parquet's > [SchemaElement|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L392] > and the > [docs|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal] > says that if the scale is unspecified then it should be considered to be 0. > Then there's the new logical type annotation > [DecimalType|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L253], > but Impala doesn't use it during scans. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10946) RECOVER PARTITIONS might create non-existing partitions
[ https://issues.apache.org/jira/browse/IMPALA-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Fürnstáhl reassigned IMPALA-10946: -- Assignee: Gergely Fürnstáhl > RECOVER PARTITIONS might create non-existing partitions > --- > > Key: IMPALA-10946 > URL: https://issues.apache.org/jira/browse/IMPALA-10946 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Gergely Fürnstáhl >Priority: Major > Labels: ramp-up > > The following commands reproduce the bug: > {noformat} > create table test_table (id int) > partitioned by (part_field string) > stored as parquet > LOCATION ‘/test-warehouse/abc/test’; > insert into test_table (id, part_field) select 1, ‘abc+’; > show partitions test_table; > it will show one partition “abc+” > alter table test_table recover partitions; > show partitions test_table; > result is showing two partitions, “abc” and > “abc+” > {noformat} > The + character can occur anywhere in the string, RECOVER PARTITIONS will > create a partition where the + is replaced by a space. > Seems like other characters don't cause this bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6636) Use async IO in ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490114#comment-17490114 ] Quanlong Huang commented on IMPALA-6636: Thank [~rizaon] and [~csringhofer] for making this done! Great work! > Use async IO in ORC scanner > --- > > Key: IMPALA-6636 > URL: https://issues.apache.org/jira/browse/IMPALA-6636 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Critical > > Though ORC-262 has no progress, we can still prefech data and let the ORC lib > reading from an in-memory InputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6636) Use async IO in ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490113#comment-17490113 ] ASF subversion and git services commented on IMPALA-6636: - Commit 97dda2b27da99367f4d07699aa046b16cda16dd4 in impala's branch refs/heads/master from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=97dda2b ] IMPALA-6636: Use async IO in ORC scanner This patch implements async IO in the ORC scanner. For each ORC stripe, we begin with iterating the column streams. If a column stream is possible for async IO, it will create ColumnRange, register ScannerContext::Stream for that ORC stream, and start the stream. We modify HdfsOrcScanner::ScanRangeInputStream::read to check whether there is a matching ColumnRange for the given offset and length. If so, the reading continue through HdfsOrcScanner::ColumnRange::read. We leverage existing async IO methods from HdfsParquetScanner class for initial memory allocations. We moved related methods such as DivideReservationBetweenColumns and ComputeIdealReservation up to HdfsColumnarScanner class. Planner calculates the memory reservation differently between async Parquet and async ORC. In async Parquet, the planner calculates the column memory reservation and relies on the backend to divide them as needed. In async ORC, the planner needs to split the column's memory reservation based on the estimated number of streams for that column type. For example, a string column with a 4MB memory estimate will need to split that estimate into four 1MB because it might use dictionary encoding with four streams (PRESENT, DATA, DICTIONARY_DATA, and LENGTH stream). This splitting is required because each async IO stream needs to start with an 8KB (min_buffer_size) initial memory reservation. To show the improvement from ORC async IO, we contrast the total time and geomean (in milliseconds) to run full TPC-DS 10 TB, 19 executors, with varying ORC_ASYNC_IO and DISABLE_DATA_CACHE options as follow: +--+--+--+ | Total time | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 | +--+--+--+ | DISABLE_DATA_CACHE=0 | 3511075 | 3484736 | | DISABLE_DATA_CACHE=1 | 5243337 | 4370095 | +--+--+--+ +--+--+--+ | Geomean | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 | +--+--+--+ | DISABLE_DATA_CACHE=0 | 12786.58042 | 12454.80365 | | DISABLE_DATA_CACHE=1 | 23081.10888 | 16692.31512 | +--+--+--+ Testing: - Pass core tests. - Pass core e2e tests with ORC_ASYNC_READ=1. Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Reviewed-on: http://gerrit.cloudera.org:8080/15370 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Use async IO in ORC scanner > --- > > Key: IMPALA-6636 > URL: https://issues.apache.org/jira/browse/IMPALA-6636 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Critical > > Though ORC-262 has no progress, we can still prefech data and let the ORC lib > reading from an in-memory InputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org