from:"Tim Armstrong \(JIRA\)"

[jira] [Commented] (IMPALA-5165) Allocate memory for all data from BufferPool

2021-10-18 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430319#comment-17430319
 ] 

Tim Armstrong commented on IMPALA-5165:
---

[~Xinyi Zou] it was a while since I worked on this but the work we did on 
memory management ended up doing a lot to solve the problem even without 
converting everything.

> Allocate memory for all data from BufferPool
> 
>
> Key: IMPALA-5165
> URL: https://issues.apache.org/jira/browse/IMPALA-5165
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> Eventually we should back RowBatches and other runtime memory (e.g. MemPools, 
> FreePools, compression buffers, etc) with memory from BufferPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5165) Allocate memory for all data from BufferPool

2021-10-18 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5165.
---
Resolution: Later

I think in the end this wasn't all that important - allocating small amounts of 
memory from MemPool is OK - it turned out that moving the large allocations to 
BufferPool was generally sufficient.

> Allocate memory for all data from BufferPool
> 
>
> Key: IMPALA-5165
> URL: https://issues.apache.org/jira/browse/IMPALA-5165
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> Eventually we should back RowBatches and other runtime memory (e.g. MemPools, 
> FreePools, compression buffers, etc) with memory from BufferPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10506) Check if Impala LZ4 has same bug as ARROW-11301

2021-02-12 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10506:
--

 Summary: Check if Impala LZ4 has same bug as ARROW-11301
 Key: IMPALA-10506
 URL: https://issues.apache.org/jira/browse/IMPALA-10506
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Tim Armstrong
Assignee: Csaba Ringhofer


I noticed ARROW-11301 in the context of a Parquet discussion 
(https://github.com/apache/parquet-format/pull/164/files/2dfe463c948948f7d9624bee3cdd4706eb3488b5#diff-a1727652430ce24c121536393f2ece63c5799a99583738f48aa8bb9fa71cb3f8)
 and wondered if Impala has made the same mistake

CC [~arawat] [~csringhofer] [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9382.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10032) Unable to close the connection when fetching data from two databases

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10032.

Resolution: Cannot Reproduce

I tried to understand what the issue was here. Attaching a test program after 
doing the work to clean up and reformat your code (please try to actually 
provide usable examples and don't make us poor maintainers work to understand 
what you're talking about).

It works fine if the SQL statements succeed.

If you don't clean up the connections by calling close() on an error, it does 
leave the connection open. The best practice is to have a finally() close that 
will clean up connections in event of error.

> Unable to close the connection when fetching data from two databases
> 
>
> Key: IMPALA-10032
> URL: https://issues.apache.org/jira/browse/IMPALA-10032
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: jayashree
>Assignee: Tim Armstrong
>Priority: Blocker
> Attachments: JDBCExample2.java
>
>
> Hi Team,
> I am connecting two databases using cloudera.impala.jdbc41.Driver.
> I have two classes with two different connections, each having different SQLs 
> to perform in respective database.
> When I am executing these two classes together, I am getting below error 
> though I am closing first connection and then connecting to other Database.
> Error: 
> j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing 
> query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> sqlState:HY000, errorMessage:AnalysisException: Could not resolve table 
> reference:* 
> *Caused by: java.sql.SQLException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
> *Caused by: com.cloudera.support.exceptions.GeneralException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
>  
> *Seems its not able to close first connection.*
> *Can you please check it*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10032) Unable to close the connection when fetching data from two databases

2021-02-11 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283281#comment-17283281
 ] 

Tim Armstrong commented on IMPALA-10032:


javac JDBCExample2.java && 
CLASSPATH=.:~/ClouderaImpalaJDBC-2.6.3.1004/ImpalaJDBC41.jar java JDBCExample2

> Unable to close the connection when fetching data from two databases
> 
>
> Key: IMPALA-10032
> URL: https://issues.apache.org/jira/browse/IMPALA-10032
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: jayashree
>Assignee: Tim Armstrong
>Priority: Blocker
> Attachments: JDBCExample2.java
>
>
> Hi Team,
> I am connecting two databases using cloudera.impala.jdbc41.Driver.
> I have two classes with two different connections, each having different SQLs 
> to perform in respective database.
> When I am executing these two classes together, I am getting below error 
> though I am closing first connection and then connecting to other Database.
> Error: 
> j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing 
> query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> sqlState:HY000, errorMessage:AnalysisException: Could not resolve table 
> reference:* 
> *Caused by: java.sql.SQLException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
> *Caused by: com.cloudera.support.exceptions.GeneralException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
>  
> *Seems its not able to close first connection.*
> *Can you please check it*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10032) Unable to close the connection when fetching data from two databases

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10032:
---
Attachment: JDBCExample2.java

> Unable to close the connection when fetching data from two databases
> 
>
> Key: IMPALA-10032
> URL: https://issues.apache.org/jira/browse/IMPALA-10032
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: jayashree
>Assignee: Tim Armstrong
>Priority: Blocker
> Attachments: JDBCExample2.java
>
>
> Hi Team,
> I am connecting two databases using cloudera.impala.jdbc41.Driver.
> I have two classes with two different connections, each having different SQLs 
> to perform in respective database.
> When I am executing these two classes together, I am getting below error 
> though I am closing first connection and then connecting to other Database.
> Error: 
> j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing 
> query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> sqlState:HY000, errorMessage:AnalysisException: Could not resolve table 
> reference:* 
> *Caused by: java.sql.SQLException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
> *Caused by: com.cloudera.support.exceptions.GeneralException:*
> *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Could not resolve table reference:* 
>  
> *Seems its not able to close first connection.*
> *Can you please check it*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10339) Apparent hang or crash in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-10339:
--

Assignee: Wenzhe Zhou  (was: Tim Armstrong)

> Apparent hang or crash in 
> TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
> ---
>
> Key: IMPALA-10339
> URL: https://issues.apache.org/jira/browse/IMPALA-10339
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Tim Armstrong
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, flaky, hang
>
> Release build with this commit as the tip:
> {noformat}
> commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging)
> Author: wzhou-code 
> Date:   Thu Oct 29 22:32:03 2020 -0700
> IMPALA-10305: Sync Kudu's FIPS compliant changes
> 
> {noformat}
> {noformat}
> Regression
> query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol:
>  beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} 
> | table_format: parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#100 )
> Took 1 hr 59 min.
> add description
> Error Message
> query_test/test_spilling.py:134: in test_spilling_no_debug_action 
> self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
> common/impala_test_suite.py:668: in run_test_case 
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
> common/impala_test_suite.py:485: in __verify_exceptions (expected_str, 
> actual_str) E   AssertionError: Unexpected exception string. Expected: 
> row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did 
> not reduce the size of a spilled partition.* E   Not found in actual: Timeout 
> >7200s
> Stacktrace
> query_test/test_spilling.py:134: in test_spilling_no_debug_action
> self.run_test_case('QueryTest/spilling-no-debug-action', vector)
> common/impala_test_suite.py:668: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:485: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot 
> perform hash join at node with id .*. Repartitioning did not reduce the size 
> of a spilled partition.*
> E   Not found in actual: Timeout >7200s
> Standard Error
> SET 
> client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> -- executing against localhost:21000
> use tpch_parquet;
> -- 2020-11-11 23:12:04,319 INFO MainThread: Started query 
> c740c1c66d9679a9:6a40f161
> SET 
> client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> SET mt_dop=0;
> SET default_spillable_buffer_size=64k;
> -- 2020-11-11 23:12:04,320 INFO MainThread: Loading query test file: 
> /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
> -- 2020-11-11 23:12:04,323 INFO MainThread: Starting new HTTP connection 
> (1): localhost
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:04,377 INFO MainThread: Started query 
> c044afcf5ae44df9:a2e2e7c6
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a, lineitem b
> where
> a.l_partkey = 1 and
> a.l_orderkey = b.l_orderkey;
> -- 2020-11-11 23:12:04,385 INFO MainThread: Started query 
> 314c019cd252f322:2411bc76
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:05,199 INFO MainThread: Started query 
> 80424e68922c30f9:b2144dff
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:05,207 INFO MainThread: Started query 
> 2a4c1f4b26ea52da:4339f3ff
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a
> where
> a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10)
> and a.l_partkey < 1000;
> -- 2020-11-11 23:12:05,215 INFO MainThread: Started query 
> f845afd00a569446:79c5054a
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:07,507 INFO MainThread: Started query 
>

[jira] [Assigned] (IMPALA-10301) Insert query hangs in test_local_catalog_ddls_with_invalidate_metadata_sync_ddl

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-10301:
--

Assignee: Wenzhe Zhou  (was: Tim Armstrong)

> Insert query hangs in 
> test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
> ---
>
> Key: IMPALA-10301
> URL: https://issues.apache.org/jira/browse/IMPALA-10301
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Qifan Chen
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, flaky, hang
> Attachments: failure-output.txt, 
> test_mixed_catalog_ddls_with_invalidate_metadata.tar.gz
>
>
> In impala-cdpd-master-staging-core-s3 test, the following error was seen 
> (insert into a partitioned table timeout). 
> Error Message
> {code:java}
> AssertionError: Query timeout(60s): insert overwrite table 
> test_local_catalog_ddls_with_invalidate_metadata_sync_ddl_b87f02d6.test_2_part
>  partition(j=2) values (1), (2), (3), (4), (5) assert False
> Stacktrace
> {code}
> {code:java}
> custom_cluster/test_concurrent_ddls.py:83: in 
> test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
> self._run_ddls_with_invalidation(unique_database, sync_ddl=True)
> custom_cluster/test_concurrent_ddls.py:146: in _run_ddls_with_invalidation
> for i in pool.imap_unordered(run_ddls, xrange(1, NUM_ITERS + 1)):
> /usr/lib64/python2.7/multiprocessing/pool.py:655: in next
> raise value
> E   AssertionError: Query timeout(60s): insert overwrite table 
> test_local_catalog_ddls_with_invalidate_metadata_sync_ddl_b87f02d6.test_2_part
>  partition(j=2) values (1), (2), (3), (4), (5)
> E   assert False
> {code}
> The URL is 
> https://master-02.jenkins.cloudera.com/view/Impala/view/Evergreen-cdpd-master-staging/job/impala-cdpd-master-staging-core-s3/lastCompletedBuild/testReport/custom_cluster.test_concurrent_ddls/TestConcurrentDdls/test_local_catalog_ddls_with_invalidate_metadata_sync_ddl/.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10470) Update wiki and README with info about Impala quickstart

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10470.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Update wiki and README with info about Impala quickstart
> 
>
> Key: IMPALA-10470
> URL: https://issues.apache.org/jira/browse/IMPALA-10470
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9793) Improved Impala quickstart

2021-02-11 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9793.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Improved Impala quickstart
> --
>
> Key: IMPALA-9793
> URL: https://issues.apache.org/jira/browse/IMPALA-9793
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> Kudu built a single container quickstart here 
> https://github.com/apache/kudu/tree/master/examples/quickstart/impala .
> We should do a better quickstart container with the following features:
> * Store data in docker volumes
> * Use the daemon containers that are more production ready
> * Have an easy solution for loading data
> * Support Kudu and Hive tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10501) Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_

2021-02-11 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10501:
--

 Summary: Hit DCHECK in parquet-column-readers.cc:  
def_levels_.CacheRemaining() <= num_buffered_values_
 Key: IMPALA-10501
 URL: https://issues.apache.org/jira/browse/IMPALA-10501
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.0
Reporter: Tim Armstrong
Assignee: Zoltán Borók-Nagy
 Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz

https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/

{noformat}
F0211 03:55:26.383247 14487 parquet-column-readers.cc:517] 
be46bb72819942fd:85934edd0001] Check failed: def_levels_.CacheRemaining() 
<= num_buffered_values_ (921 vs. 916) 
*** Check failure stack trace: ***
@  0x53646ec  google::LogMessage::Fail()
@  0x5365fdc  google::LogMessage::SendToLog()
@  0x536404a  google::LogMessage::Flush()
@  0x5367c48  google::LogMessageFatal::~LogMessageFatal()
@  0x2ff886f  
impala::ScalarColumnReader<>::MaterializeValueBatch<>()
@  0x2f8ae44  
impala::ScalarColumnReader<>::MaterializeValueBatch<>()
@  0x2f761bf  impala::ScalarColumnReader<>::ReadValueBatch<>()
@  0x2f2889a  impala::ScalarColumnReader<>::ReadValueBatch()
@  0x2ebd8c0  impala::HdfsParquetScanner::AssembleRows()
@  0x2eb882e  impala::HdfsParquetScanner::GetNextInternal()
@  0x2eb67bd  impala::HdfsParquetScanner::ProcessSplit()
@  0x2aaf3f2  impala::HdfsScanNode::ProcessSplit()
@  0x2aae773  impala::HdfsScanNode::ScannerThread()
@  0x2aadadb  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x2aafe94  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x220e331  boost::function0<>::operator()()
@  0x2842e7f  impala::Thread::SuperviseThread()
@  0x284ae1c  boost::_bi::list5<>::operator()<>()
@  0x284ad40  boost::_bi::bind_t<>::operator()()
@  0x284ad01  boost::detail::thread_data<>::run()
@  0x406b291  thread_proxy
@ 0x7f2465cba6b9  start_thread
@ 0x7f24627e64dc  clone
rImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866)
{noformat}

It was likely a fuzz test:
{noformat}
19:55:23 
query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none] 
19:55:23 [gw5] PASSED 
query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none] 
19:55:23 
query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none] 
19:55:25 [gw2] PASSED 
query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
 beeswax | exec_option: {'mt_dop': 0, 'exec_single_node_rows_threshold': 0} | 
table_format: parquet/none] 
19:55:25 
query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
 beeswax | exec_option: {'mt_dop': 1, 'exec_single_node_rows_threshold': 0} | 
table_format: avro/snap/block] 
19:55:26 [gw5] PASSED 
query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0,

[jira] [Commented] (IMPALA-10470) Update wiki and README with info about Impala quickstart

2021-02-10 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282824#comment-17282824
 ] 

Tim Armstrong commented on IMPALA-10470:


I updated the front page of CWiki

> Update wiki and README with info about Impala quickstart
> 
>
> Key: IMPALA-10470
> URL: https://issues.apache.org/jira/browse/IMPALA-10470
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-10470) Update wiki and README with info about Impala quickstart

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10470 started by Tim Armstrong.
--
> Update wiki and README with info about Impala quickstart
> 
>
> Key: IMPALA-10470
> URL: https://issues.apache.org/jira/browse/IMPALA-10470
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10470) Update wiki and README with info about Impala quickstart

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10470:
---
Summary: Update wiki and README with info about Impala quickstart  (was: 
Update wiki with info about Impala quickstart)

> Update wiki and README with info about Impala quickstart
> 
>
> Key: IMPALA-10470
> URL: https://issues.apache.org/jira/browse/IMPALA-10470
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10228:
---
Affects Version/s: Impala 4.0

> Avoid or codegen std::map comparisons in partitioned top-n
> --
>
> Key: IMPALA-10228
> URL: https://issues.apache.org/jira/browse/IMPALA-10228
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: codegen, performance
>
> The partitioned top-n implementation currently uses std::map to store the 
> heaps. We can't inline the tuple comparator easily because of the 
> indirections in the standard library code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10228:
---
Labels: codegen performance  (was: )

> Avoid or codegen std::map comparisons in partitioned top-n
> --
>
> Key: IMPALA-10228
> URL: https://issues.apache.org/jira/browse/IMPALA-10228
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: codegen, performance
>
> The partitioned top-n implementation currently uses std::map to store the 
> heaps. We can't inline the tuple comparator easily because of the 
> indirections in the standard library code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9853) Push rank() predicates into sort

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9853.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed



Commit b42c64993d46893488a667fb9c425548fdf964ab in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b42c649 ]

IMPALA-9979: part 2: partitioned top-n

Planner changes:
---
The planner now identifies predicates that can be converted into
limits in a partitioned or unpartitioned top-n with the following
method:

Push down predicates that reference analytic tuple into inline view.
These will be evaluated after the analytic plan for the inline
SelectStmt is generated.
Identify predicates that reference the analytic tuple and could
be converted to limits.
If they can be applied to the last sort group of the analytic
plan, and the windows are all compatible, then the lowest
limit gets converted into a limit in the top N.
Otherwise generate a select node with the conjuncts. We add
logic to merge SELECT nodes to avoid generating duplicates
from inside and outside the inline view.
The pushed predicate is still added to the SELECT node
because it is necessary for correctness for predicates
like '=' to filter additional rows and also the limit
pushdown optimization looks for analytic predicates
there, so retaining all predicates simplifies that.
The selectivity of the predicate is adjusted so that
cardinality estimates remain accurate.

The optimization can be disabled by setting
ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is
only enabled for limits of 1000 or less, because the
in-memory Top-N may perform significantly worse than
a full sort for large heaps (since updating the heap
for every input row ends up being more expensive than
doing a traditional sort). We could probably optimize
this more with better tuning so that it can gracefully
fall back to doing the full sort at runtime.

rank() and row_number() are handled. rank() needs support in
the TopN node to include ties for the last place, which is
also added in this patch.

If predicates are trivially false, we generate empty nodes.

This interacts with the limit pushdwon optimization. The limit
pushdown optimization is applied after the partitioned top-n
is generated, and can sometimes result in more optimal plans,
so it is generalized to handle pushing into partitioned top-n
nodes.

Backend changes:
---
The top-n node in the backend is augmented to handle
the partitioned case, for which we use a std::map and a
comparator based on the partition exprs. The partitioned
top-n node has a soft limit of 64MB on the size of the
in-memory heaps and can spill with use of an embedded Sorter.
The current implementation tries to evict heaps that are
less effective at filtering rows.

Limitations:
---
There are several possible extensions to this that we did not do:

dense_rank() is not supported because it would require additional
backend support - IMPALA-10014.
ntile() is not supported because it would require additional
backend support - IMPALA-10174.
Only one predicate per analytic is pushed.
Redundant rank()/row_number() predicates are not merged,
only the lowest is chosen.
Lower bounds are not converted into OFFSET.
The analytic operator cannot be eliminated even if the analytic
expression was only used in the predicate.
This doesn't push predicates into UNION - IMPALA-10013
Always false predicates don't result in empty plan - IMPALA-10015

Tests:

Planner tests - added tests that exercise the interesting code
paths added in planning.

Predicate ordering in SELECT nodes changed in a couple of cases
because some predicates were pushed into the inline views.

Modified SORT targeted perf tests to avoid conversion to Top-N
Added targeted perf test for partitioned top-n.
End-to-end tests

Unpartitioned Top-N end-to-end tests
Basic partitioning and duplicate handling tests on functional
Similar basic tests on larger inputs from TPC-DS and with
larger partition counts.
I inspected the results and also ran the same tests with
analytic_rank_pushdown_threshold=0 to confirm that the
results were the same as with the full sort.
Fallback to spilling sort.

Perf:

Added a targeted benchmark that goes from ~2s to ~1s with
mt_dop=8 on TPC-H 30 on my desktop.

Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Reviewed-on: http://gerrit.cloudera.org:8080/16242
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Push rank() predicates into sort
> 
>
> Key: IMPALA-9853
> URL: https://issues.apache.org/jira/browse/IMPALA-9853
> Project: IMPALA
>  Issue

[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10228:
---
Priority: Minor  (was: Major)

> Avoid or codegen std::map comparisons in partitioned top-n
> --
>
> Key: IMPALA-10228
> URL: https://issues.apache.org/jira/browse/IMPALA-10228
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: codegen, performance
>
> The partitioned top-n implementation currently uses std::map to store the 
> heaps. We can't inline the tuple comparator easily because of the 
> indirections in the standard library code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10228:
---
Parent: (was: IMPALA-9853)
Issue Type: Improvement  (was: Sub-task)

> Avoid or codegen std::map comparisons in partitioned top-n
> --
>
> Key: IMPALA-10228
> URL: https://issues.apache.org/jira/browse/IMPALA-10228
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>
> The partitioned top-n implementation currently uses std::map to store the 
> heaps. We can't inline the tuple comparator easily because of the 
> indirections in the standard library code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2783) Push down filters on rank similar to limit

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2783.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

Fixed with 

Commit b42c64993d46893488a667fb9c425548fdf964ab in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b42c649 ]

IMPALA-9979: part 2: partitioned top-n

Planner changes:
---
The planner now identifies predicates that can be converted into
limits in a partitioned or unpartitioned top-n with the following
method:

Push down predicates that reference analytic tuple into inline view.
These will be evaluated after the analytic plan for the inline
SelectStmt is generated.
Identify predicates that reference the analytic tuple and could
be converted to limits.
If they can be applied to the last sort group of the analytic
plan, and the windows are all compatible, then the lowest
limit gets converted into a limit in the top N.
Otherwise generate a select node with the conjuncts. We add
logic to merge SELECT nodes to avoid generating duplicates
from inside and outside the inline view.
The pushed predicate is still added to the SELECT node
because it is necessary for correctness for predicates
like '=' to filter additional rows and also the limit
pushdown optimization looks for analytic predicates
there, so retaining all predicates simplifies that.
The selectivity of the predicate is adjusted so that
cardinality estimates remain accurate.

The optimization can be disabled by setting
ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is
only enabled for limits of 1000 or less, because the
in-memory Top-N may perform significantly worse than
a full sort for large heaps (since updating the heap
for every input row ends up being more expensive than
doing a traditional sort). We could probably optimize
this more with better tuning so that it can gracefully
fall back to doing the full sort at runtime.

rank() and row_number() are handled. rank() needs support in
the TopN node to include ties for the last place, which is
also added in this patch.

If predicates are trivially false, we generate empty nodes.

This interacts with the limit pushdwon optimization. The limit
pushdown optimization is applied after the partitioned top-n
is generated, and can sometimes result in more optimal plans,
so it is generalized to handle pushing into partitioned top-n
nodes.

Backend changes:
---
The top-n node in the backend is augmented to handle
the partitioned case, for which we use a std::map and a
comparator based on the partition exprs. The partitioned
top-n node has a soft limit of 64MB on the size of the
in-memory heaps and can spill with use of an embedded Sorter.
The current implementation tries to evict heaps that are
less effective at filtering rows.

Limitations:
---
There are several possible extensions to this that we did not do:

dense_rank() is not supported because it would require additional
backend support - IMPALA-10014.
ntile() is not supported because it would require additional
backend support - IMPALA-10174.
Only one predicate per analytic is pushed.
Redundant rank()/row_number() predicates are not merged,
only the lowest is chosen.
Lower bounds are not converted into OFFSET.
The analytic operator cannot be eliminated even if the analytic
expression was only used in the predicate.
This doesn't push predicates into UNION - IMPALA-10013
Always false predicates don't result in empty plan - IMPALA-10015

Tests:

Planner tests - added tests that exercise the interesting code
paths added in planning.

Predicate ordering in SELECT nodes changed in a couple of cases
because some predicates were pushed into the inline views.

Modified SORT targeted perf tests to avoid conversion to Top-N
Added targeted perf test for partitioned top-n.
End-to-end tests

Unpartitioned Top-N end-to-end tests
Basic partitioning and duplicate handling tests on functional
Similar basic tests on larger inputs from TPC-DS and with
larger partition counts.
I inspected the results and also ran the same tests with
analytic_rank_pushdown_threshold=0 to confirm that the
results were the same as with the full sort.
Fallback to spilling sort.

Perf:

Added a targeted benchmark that goes from ~2s to ~1s with
mt_dop=8 on TPC-H 30 on my desktop.

Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Reviewed-on: http://gerrit.cloudera.org:8080/16242
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Push down filters on rank similar to limit
> --
>
> Key: IMPALA-2783
> URL: https://issues.apache.org/jira/browse/IMPALA-2783
>

[jira] [Resolved] (IMPALA-9979) Backend partitioned top-n operator

2021-02-10 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9979.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Backend partitioned top-n operator
> --
>
> Key: IMPALA-9979
> URL: https://issues.apache.org/jira/browse/IMPALA-9979
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> This is to implement the backend support for the partitioned top-n operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10469) Support pushing quickstart images to Apache repo

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10469.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Support pushing quickstart images to Apache repo
> 
>
> Key: IMPALA-10469
> URL: https://issues.apache.org/jira/browse/IMPALA-10469
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> We need a naming scheme and maybe a script to do the push. We've so far 
> assumed a different repository for each image, but in the Apache docker, we 
> only have a single repository and need to encode the image type and version 
> into the tag
> See  https://hub.docker.com/repository/docker/apache/kudu for an example.
> They have:
> apache/kudu:
> apache/kudu:kudu-python-
> apache/kudu:impala-latest
> Airflow does the opposite, and this might be easier to use with 
> IMPALA_QUICKSTART_IMAGE_PREFIX: 
> https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1=last_updated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8721.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Abhishek Rawat
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: Interoperability, correctness, hive, impala, parquet, 
> timestamp
> Fix For: Impala 4.0
>
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7092) Re-enable EC tests broken by HDFS-13539

2021-02-09 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281953#comment-17281953
 ] 

Tim Armstrong commented on IMPALA-7092:
---

These seem to be marked by @SkipIfEC.oom

> Re-enable EC tests broken by HDFS-13539 
> 
>
> Key: IMPALA-7092
> URL: https://issues.apache.org/jira/browse/IMPALA-7092
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Priority: Major
>
> With HDFS-13539 and HDFS-13540 fixed, we should be able to re-enable some 
> tests and diagnose the causes of the remaining failed tests without much 
> noise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9586) Update query option docs to account for interactions with mt_dop

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9586.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Update query option docs to account for interactions with mt_dop
> 
>
> Key: IMPALA-9586
> URL: https://issues.apache.org/jira/browse/IMPALA-9586
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> in some cases mt_dop changes the behaviour of other options or makes them a 
> no-op.  We need to update docs to reflect this.
> * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, 
> i.e. only one thread is used.
> * NUM_SCANNER_THREADS has no effect when MT_DOP>=1
> * Maybe other changes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9382 started by Tim Armstrong.
-
> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-09 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281873#comment-17281873
 ] 

Tim Armstrong commented on IMPALA-9382:
---

Actually I should reduce the verbosity of the default option a bit as part 3

> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Reopened] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reopened IMPALA-9382:
---

> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-09 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9382:
--
Fix Version/s: (was: Impala 4.0)

> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9586) Update query option docs to account for interactions with mt_dop

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9586 started by Tim Armstrong.
-
> Update query option docs to account for interactions with mt_dop
> 
>
> Key: IMPALA-9586
> URL: https://issues.apache.org/jira/browse/IMPALA-9586
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> in some cases mt_dop changes the behaviour of other options or makes them a 
> no-op.  We need to update docs to reflect this.
> * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, 
> i.e. only one thread is used.
> * NUM_SCANNER_THREADS has no effect when MT_DOP>=1
> * Maybe other changes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9378) CPU usage for runtime profiles with multithreading

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9378:
-

Assignee: (was: Tim Armstrong)

> CPU usage for runtime profiles with multithreading
> --
>
> Key: IMPALA-9378
> URL: https://issues.apache.org/jira/browse/IMPALA-9378
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: multithreading, performance
> Attachments: coord_q5_dop0.svg, coord_q5_dop16.svg
>
>
> [~drorke] reports that significant amounts of time can be spent on the 
> runtime profile with higher values of mt_dop. This can impact query 
> performance from the client's point of view since profile serialisation is on 
> the critical path for closing the query. Also serialising the profile for the 
> webserver holds the ClientRequestState's lock, so can block query progress.
> We should figure out how to make this more efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9884:
-

Assignee: (was: Tim Armstrong)

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: impalad-executors.tar.gz, 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10491) Impala parquet scanner should use writer.time.zone when converting Hive timestamps

2021-02-08 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10491:
--

 Summary: Impala parquet scanner should use writer.time.zone when 
converting Hive timestamps
 Key: IMPALA-10491
 URL: https://issues.apache.org/jira/browse/IMPALA-10491
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Tim Armstrong


IMPALA-8721 reports some issues with Hive 3 and timezone conversion.

HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the 
Parquet metadata, which provides a better way to determine how the time zone 
was written. E.g.

{noformat}
tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar 
~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta 
/test-warehouse/asdfgh/00_0
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
file:hdfs://localhost:20500/test-warehouse/asdfgh/00_0
creator: parquet-mr version 1.10.99.7.2.7.0-44 (build 
27344fd5fdaa371e364c604f471b340f8bcf8936)
extra:   writer.date.proleptic = false
extra:   writer.time.zone = America/Los_Angeles
extra:   writer.model.name = 3.1.3000.7.2.7.0-44
{noformat}

We should use this timezone when converting timestamps, I think either always 
or when convert_legacy_hive_parquet_utc_timestamps=true. 

CC [~boroknagyz] [~csringhofer]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2021-02-08 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281518#comment-17281518
 ] 

Tim Armstrong commented on IMPALA-8721:
---

I think this was fixed by HIVE-21290 - the test passes now if I revert 
IMPALA-8689

> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Abhishek Rawat
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: Interoperability, correctness, hive, impala, parquet, 
> timestamp
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8646:
-

Assignee: (was: Ethan)

> Integrate etcd into Impala for cluster membership updates
> -
>
> Key: IMPALA-8646
> URL: https://issues.apache.org/jira/browse/IMPALA-8646
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Priority: Minor
> Attachments: 91204e6.diff.zip
>
>
> This task involves replacing usage of the statestore membership topic with 
> etcd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates

2021-02-08 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281429#comment-17281429
 ] 

Tim Armstrong commented on IMPALA-8646:
---

Uploading patches that [~ethan.xue] had on gerrit

> Integrate etcd into Impala for cluster membership updates
> -
>
> Key: IMPALA-8646
> URL: https://issues.apache.org/jira/browse/IMPALA-8646
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Assignee: Ethan
>Priority: Minor
> Attachments: 91204e6.diff.zip
>
>
> This task involves replacing usage of the statestore membership topic with 
> etcd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8646:
--
Attachment: 91204e6.diff.zip

> Integrate etcd into Impala for cluster membership updates
> -
>
> Key: IMPALA-8646
> URL: https://issues.apache.org/jira/browse/IMPALA-8646
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Assignee: Ethan
>Priority: Minor
> Attachments: 91204e6.diff.zip
>
>
> This task involves replacing usage of the statestore membership topic with 
> etcd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8645) Write a basic C++ gRPC client for etcd

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8645:
--
Attachment: 5684404.diff.zip
a8e3d99.diff.zip

> Write a basic C++ gRPC client for etcd
> --
>
> Key: IMPALA-8645
> URL: https://issues.apache.org/jira/browse/IMPALA-8645
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Assignee: Ethan
>Priority: Minor
> Attachments: 5684404.diff.zip, a8e3d99.diff.zip
>
>
> This task involves creating a basic C++ gRPC client that can interact with a 
> local etcd pseudo-cluster and has an API that can be used to replace the 
> functionality of the statestore. Also, this client and its dependencies 
> should be integrated into the Impala repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8645) Write a basic C++ gRPC client for etcd

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8645:
-

Assignee: (was: Ethan)

> Write a basic C++ gRPC client for etcd
> --
>
> Key: IMPALA-8645
> URL: https://issues.apache.org/jira/browse/IMPALA-8645
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Priority: Minor
> Attachments: 5684404.diff.zip, a8e3d99.diff.zip
>
>
> This task involves creating a basic C++ gRPC client that can interact with a 
> local etcd pseudo-cluster and has an API that can be used to replace the 
> functionality of the statestore. Also, this client and its dependencies 
> should be integrated into the Impala repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8645) Write a basic C++ gRPC client for etcd

2021-02-08 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281427#comment-17281427
 ] 

Tim Armstrong commented on IMPALA-8645:
---

Uploading patches that [~ethan.xue] submitted to gerrit

> Write a basic C++ gRPC client for etcd
> --
>
> Key: IMPALA-8645
> URL: https://issues.apache.org/jira/browse/IMPALA-8645
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Ethan
>Assignee: Ethan
>Priority: Minor
> Attachments: 5684404.diff.zip, a8e3d99.diff.zip
>
>
> This task involves creating a basic C++ gRPC client that can interact with a 
> local etcd pseudo-cluster and has an API that can be used to replace the 
> functionality of the statestore. Also, this client and its dependencies 
> should be integrated into the Impala repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9382) Prototype denser runtime profile implementation

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9382.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

We have a solid prototype

> Prototype denser runtime profile implementation
> ---
>
> Key: IMPALA-9382
> URL: https://issues.apache.org/jira/browse/IMPALA-9382
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_504b379400cba9f2_2d2cf007, 
> tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt
>
>
> RuntimeProfile trees can potentially stress the memory allocator and use up a 
> lot more memory and cache than is really necessary:
> * std::map is used throughout, and allocates a node per map entry. We do 
> depend on the counters being displayed in-order, but we would probably be 
> better of storing the counters in a vector and lazily sorting when needed 
> (since the set of counters is generally static after Prepare()).
> * We store the same counter names redundantly all over the place. We'd 
> probably be best off using a pool of constant counter names (we could just 
> require registering them upfront).
> There may be a small gain from switching thrift to using unordered_map, e.g. 
> for the info strings that appear with some frequency in profiles.
> However, I think we need to restructure the thrift representation and 
> in-memory representation to get significant gains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7885) Create function to convert to ts from unix millis

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7885:
-

Assignee: (was: Tim Armstrong)

> Create function to convert to ts from unix millis
> -
>
> Key: IMPALA-7885
> URL: https://issues.apache.org/jira/browse/IMPALA-7885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: eugen yushin
>Priority: Major
>  Labels: ramp-up
>
> There're several functions like 
> `from_unixtime`/`unix_micros_to_utc_timestamp`/`to_timestamp` in Impala which 
> accepts seconds and micros, but none of them works with millis.
> At the same time, Impala already has all necessary utility methods to add 
> such a functionality:
> [https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-value.inline.h#L54]
> {code}
> inline TimestampValue TimestampValue::UtcFromUnixTimeMillis(int64_t 
> unix_time_millis) {
>  return UtcFromUnixTimeTicks(unix_time_millis);
> }
> {code}
> https://github.com/apache/impala/blob/master/be/src/exprs/timestamp-functions-ir.cc#L141
> {code}
> TimestampVal TimestampFunctions::UnixMicrosToUtcTimestamp(FunctionContext* 
> context,
> const BigIntVal& unix_time_micros) {
>   if (unix_time_micros.is_null) return TimestampVal::null();
>   TimestampValue tv = 
> TimestampValue::UtcFromUnixTimeMicros(unix_time_micros.val);
>   TimestampVal result;
>   tv.ToTimestampVal();
>   return result;
> }
> {code}
> It would be better to have Unix millis to timestamp conversion function as 
> buit-in functionality to prevent from:
> - creating cumbersome 'aliases' like:
> {code}
> select unix_micros_to_utc_timestamp(1513895588243 * 1000)
> {code}
> or
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Why-not-from-unixtime-function-handles-an-unix-timestamp-in/m-p/63182#M3969
> {code}
> select cast(1513895588243 div 1000 as timestamp) + interval (1513895588243 % 
> 1000) milliseconds;
> {code}
> - writing relatively slow udfs in java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9846) Switch to aggregated runtime profile representation

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9846:
-

Assignee: (was: Tim Armstrong)

> Switch to aggregated runtime profile representation
> ---
>
> Key: IMPALA-9846
> URL: https://issues.apache.org/jira/browse/IMPALA-9846
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> We need to ensure that the aggregated profile is an adequate replacement, 
> then switch over the default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6973) auth_to_local not considered for delegated users

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6973:
-

Assignee: (was: Tim Armstrong)

> auth_to_local not considered for delegated users
> 
>
> Key: IMPALA-6973
> URL: https://issues.apache.org/jira/browse/IMPALA-6973
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Adriano
>Priority: Major
>  Labels: seca
>
> When the user-names are stored in Active Directory in UPPERCASE, but all 
> usernames in linux/CDH are in lowercase it is usually used the user name 
> conversion by the auth_to_local_rule.
> I.e.:
> To perform this conversion, we use the rule:
> auth_to_local=RULE:[1:$1@$0](.*@*.COMPANY.COM)s/@.*///L
> with the switch "/L" to convert usernames to lower case.
> This works for "normal user" authentication, i.e. the webinterfaces, access 
> to impala via ODBC.
> However, when it is used the "delegation user", the auth_to_local_rule is not 
> used and to get it works the  should be configured 
> in UPPERCASE.
> We are checking auth_to_local for the User authentication:
> https://github.com/cloudera/Impala/blob/cdh5-2.5.0_5.7.5/fe/src/main/java/com/cloudera/impala/authorization/User.java
> but not for the delegated user:
> https://github.com/cloudera/Impala/blob/87482a4f367f8c1edd12af494e4992ac8f7aa3ba/be/src/service/impala-hs2-server.cc#L308-L336
> https://github.com/cloudera/Impala/blob/cdh5-2.5.0_5.7.5/be/src/service/impala-server.cc#L1197-L1230



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9774) impala-shell regression when connecting to cluster with SSL enabled

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9774:
-

Assignee: (was: Tim Armstrong)

> impala-shell regression when connecting to cluster with SSL enabled
> ---
>
> Key: IMPALA-9774
> URL: https://issues.apache.org/jira/browse/IMPALA-9774
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Reporter: Tim Armstrong
>Priority: Major
>
> {noformat}
> $ impala-shell -i xxx.vpc.cloudera.com -d default -k --ssl --ca_cert 
> /xx.pem
> Starting Impala Shell with Kerberos authentication using Python 2.7.5
> Using service name 'impala'
> SSL is enabled
> No handlers could be found for logger "thrift.transport.sslcompat"
> Error connecting: NotImplementedError, Wrong number of arguments for 
> overloaded function 'Client_setAttr'.
>   Possible C/C++ prototypes are:
> setAttr(saslwrapper::Client *,std::string const &,std::string const &)
> setAttr(saslwrapper::Client *,std::string const &,uint32_t)
> {noformat}
> This was caused by the unicode changes in "IMPALA-3343, IMPALA-9489: Make 
> impala-shell compatible with python 3" - in some places a unicode string gets 
> passed into the sasl library, and the older version of the library can't 
> handle it. The SASL upgrade - IMPALA-9719 - fixes it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3902) Multi-threaded query execution

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3902:
--
Fix Version/s: Impala 4.0

> Multi-threaded query execution
> --
>
> Key: IMPALA-3902
> URL: https://issues.apache.org/jira/browse/IMPALA-3902
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Marcel Kinard
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: multithreading
> Fix For: Impala 4.0
>
>
> Currently, a single query fragment is run in a quasi-single threaded manner 
> on a node: the scanners are run in multiple threads, but all other operators 
> (joins, aggregation) are run in the main thread.
> The goal is to add multi-threaded execution on a single node by running 
> multiple fragment instances (each of which runs in a single thread).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3902) Multi-threaded query execution

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3902.
---
Resolution: Fixed

Closing because it is now broadly usable for many use cases. There are some 
issues that might cause challenges for migration of complex workloads that I 
have moved to IMPALA-10486.

> Multi-threaded query execution
> --
>
> Key: IMPALA-3902
> URL: https://issues.apache.org/jira/browse/IMPALA-3902
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Marcel Kinard
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: multithreading
>
> Currently, a single query fragment is run in a quasi-single threaded manner 
> on a node: the scanners are run in multiple threads, but all other operators 
> (joins, aggregation) are run in the main thread.
> The goal is to add multi-threaded execution on a single node by running 
> multiple fragment instances (each of which runs in a single thread).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9586) Update query option docs to account for interactions with mt_dop

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9586:
-

Assignee: Tim Armstrong

> Update query option docs to account for interactions with mt_dop
> 
>
> Key: IMPALA-9586
> URL: https://issues.apache.org/jira/browse/IMPALA-9586
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> in some cases mt_dop changes the behaviour of other options or makes them a 
> no-op.  We need to update docs to reflect this.
> * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, 
> i.e. only one thread is used.
> * NUM_SCANNER_THREADS has no effect when MT_DOP>=1
> * Maybe other changes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10486) Multithreading upgrade path for large clusters

2021-02-08 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10486:
--

 Summary: Multithreading upgrade path for large clusters
 Key: IMPALA-10486
 URL: https://issues.apache.org/jira/browse/IMPALA-10486
 Project: IMPALA
  Issue Type: Epic
  Components: Backend
Reporter: Tim Armstrong


Issues needed to be able to smoothly enable multithreading for existing 
workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3902) Multi-threaded query execution

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3902:
-

Assignee: (was: Tim Armstrong)

> Multi-threaded query execution
> --
>
> Key: IMPALA-3902
> URL: https://issues.apache.org/jira/browse/IMPALA-3902
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Marcel Kinard
>Priority: Minor
>  Labels: multithreading
>
> Currently, a single query fragment is run in a quasi-single threaded manner 
> on a node: the scanners are run in multiple threads, but all other operators 
> (joins, aggregation) are run in the main thread.
> The goal is to add multi-threaded execution on a single node by running 
> multiple fragment instances (each of which runs in a single thread).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3902) Multi-threaded query execution

2021-02-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3902:
-

Assignee: Tim Armstrong

> Multi-threaded query execution
> --
>
> Key: IMPALA-3902
> URL: https://issues.apache.org/jira/browse/IMPALA-3902
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Marcel Kinard
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: multithreading
>
> Currently, a single query fragment is run in a quasi-single threaded manner 
> on a node: the scanners are run in multiple threads, but all other operators 
> (joins, aggregation) are run in the main thread.
> The goal is to add multi-threaded execution on a single node by running 
> multiple fragment instances (each of which runs in a single thread).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10470) Update wiki with info about Impala quickstart

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-10470:
--

Assignee: Tim Armstrong

> Update wiki with info about Impala quickstart
> -
>
> Key: IMPALA-10470
> URL: https://issues.apache.org/jira/browse/IMPALA-10470
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10469) Support pushing quickstart images to Apache repo

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-10469:
--

Assignee: Tim Armstrong

> Support pushing quickstart images to Apache repo
> 
>
> Key: IMPALA-10469
> URL: https://issues.apache.org/jira/browse/IMPALA-10469
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> We need a naming scheme and maybe a script to do the push. We've so far 
> assumed a different repository for each image, but in the Apache docker, we 
> only have a single repository and need to encode the image type and version 
> into the tag
> See  https://hub.docker.com/repository/docker/apache/kudu for an example.
> They have:
> apache/kudu:
> apache/kudu:kudu-python-
> apache/kudu:impala-latest
> Airflow does the opposite, and this might be easier to use with 
> IMPALA_QUICKSTART_IMAGE_PREFIX: 
> https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1=last_updated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-10469) Support pushing quickstart images to Apache repo

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10469 started by Tim Armstrong.
--
> Support pushing quickstart images to Apache repo
> 
>
> Key: IMPALA-10469
> URL: https://issues.apache.org/jira/browse/IMPALA-10469
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> We need a naming scheme and maybe a script to do the push. We've so far 
> assumed a different repository for each image, but in the Apache docker, we 
> only have a single repository and need to encode the image type and version 
> into the tag
> See  https://hub.docker.com/repository/docker/apache/kudu for an example.
> They have:
> apache/kudu:
> apache/kudu:kudu-python-
> apache/kudu:impala-latest
> Airflow does the opposite, and this might be easier to use with 
> IMPALA_QUICKSTART_IMAGE_PREFIX: 
> https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1=last_updated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work stopped] (IMPALA-9793) Improved Impala quickstart

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9793 stopped by Tim Armstrong.
-
> Improved Impala quickstart
> --
>
> Key: IMPALA-9793
> URL: https://issues.apache.org/jira/browse/IMPALA-9793
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> Kudu built a single container quickstart here 
> https://github.com/apache/kudu/tree/master/examples/quickstart/impala .
> We should do a better quickstart container with the following features:
> * Store data in docker volumes
> * Use the daemon containers that are more production ready
> * Have an easy solution for loading data
> * Support Kudu and Hive tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9793) Improved Impala quickstart

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9793 started by Tim Armstrong.
-
> Improved Impala quickstart
> --
>
> Key: IMPALA-9793
> URL: https://issues.apache.org/jira/browse/IMPALA-9793
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> Kudu built a single container quickstart here 
> https://github.com/apache/kudu/tree/master/examples/quickstart/impala .
> We should do a better quickstart container with the following features:
> * Store data in docker volumes
> * Use the daemon containers that are more production ready
> * Have an easy solution for loading data
> * Support Kudu and Hive tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10469) Support pushing quickstart images to Apache repo

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10469:
---
Description: 
We need a naming scheme and maybe a script to do the push. We've so far assumed 
a different repository for each image, but in the Apache docker, we only have a 
single repository and need to encode the image type and version into the tag

See  https://hub.docker.com/repository/docker/apache/kudu for an example.

They have:
apache/kudu:
apache/kudu:kudu-python-
apache/kudu:impala-latest

Airflow does the opposite, and this might be easier to use with 
IMPALA_QUICKSTART_IMAGE_PREFIX: 

https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1=last_updated

  was:
We need a naming scheme and maybe a script to do the push. We've so far assumed 
a different repository for each image, but in the Apache docker, we only have a 
single repository and need to encode the image type and version into the tag

See  https://hub.docker.com/repository/docker/apache/kudu for an example.

They have:
apache/kudu:
apache/kudu:kudu-python-
apache/kudu:impala-latest


> Support pushing quickstart images to Apache repo
> 
>
> Key: IMPALA-10469
> URL: https://issues.apache.org/jira/browse/IMPALA-10469
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>
> We need a naming scheme and maybe a script to do the push. We've so far 
> assumed a different repository for each image, but in the Apache docker, we 
> only have a single repository and need to encode the image type and version 
> into the tag
> See  https://hub.docker.com/repository/docker/apache/kudu for an example.
> They have:
> apache/kudu:
> apache/kudu:kudu-python-
> apache/kudu:impala-latest
> Airflow does the opposite, and this might be easier to use with 
> IMPALA_QUICKSTART_IMAGE_PREFIX: 
> https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1=last_updated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10389) Container for impala-profile-tool

2021-02-05 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10389.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Container for impala-profile-tool
> -
>
> Key: IMPALA-10389
> URL: https://issues.apache.org/jira/browse/IMPALA-10389
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> Following on from IMPALA-9865, it would be useful to have a docker container 
> available to dump out Impala profiles - this would make it wayyy easier to 
> consume profile logs without installing anything complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML

2021-02-04 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279237#comment-17279237
 ] 

Tim Armstrong commented on IMPALA-10475:


I think I was a bit sloppy saying "traditional", we don't use that in the doc. 
I guess maybe we should actually just say that it does apply to all 
filesystem-based tables - transactional tables will be strongly consistent 
anyway, so the user-facing behaviour will be the same as if SYNC_DDL was used.

Maybe we can just weave it into the original sentence "Although INSERT is 
classified as a DML statement, when the SYNC_DDL option is enabled, INSERT 
statements on filesystem-based tables ..."

> SYNC_DDL docs should clarify that it only affects DML
> -
>
> Key: IMPALA-10475
> URL: https://issues.apache.org/jira/browse/IMPALA-10475
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: shajini thayasingh
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html
> {noformat}
> Although INSERT is classified as a DML statement, when the SYNC_DDL option is 
> enabled, INSERT statements also delay their completion until all the 
> underlying data and metadata changes are propagated to all Impala nodes. 
> Internally, Impala inserts have similarities with DDL statements in 
> traditional database systems, because they create metadata needed to track 
> HDFS block locations for new files and they potentially add new partitions to 
> partitioned tables. 
> {noformat}
> I saw someone read this as applying to all tables (Kudu, HBase, etc) but it 
> only inherently applies to traditional non-transactional filesystem-based 
> tables. It also applies to transactional tables until IMPALA-8631 is fixed, 
> after which they will be more strongly consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML

2021-02-04 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10475:
---
Issue Type: Task  (was: Documentation)

> SYNC_DDL docs should clarify that it only affects DML
> -
>
> Key: IMPALA-10475
> URL: https://issues.apache.org/jira/browse/IMPALA-10475
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: shajini thayasingh
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html
> {noformat}
> Although INSERT is classified as a DML statement, when the SYNC_DDL option is 
> enabled, INSERT statements also delay their completion until all the 
> underlying data and metadata changes are propagated to all Impala nodes. 
> Internally, Impala inserts have similarities with DDL statements in 
> traditional database systems, because they create metadata needed to track 
> HDFS block locations for new files and they potentially add new partitions to 
> partitioned tables. 
> {noformat}
> I saw someone read this as applying to all tables (Kudu, HBase, etc) but it 
> only inherently applies to traditional non-transactional filesystem-based 
> tables. It also applies to transactional tables until IMPALA-8631 is fixed, 
> after which they will be more strongly consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML

2021-02-04 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-10475:
--

Assignee: shajini thayasingh

> SYNC_DDL docs should clarify that it only affects DML
> -
>
> Key: IMPALA-10475
> URL: https://issues.apache.org/jira/browse/IMPALA-10475
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: shajini thayasingh
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html
> {noformat}
> Although INSERT is classified as a DML statement, when the SYNC_DDL option is 
> enabled, INSERT statements also delay their completion until all the 
> underlying data and metadata changes are propagated to all Impala nodes. 
> Internally, Impala inserts have similarities with DDL statements in 
> traditional database systems, because they create metadata needed to track 
> HDFS block locations for new files and they potentially add new partitions to 
> partitioned tables. 
> {noformat}
> I saw someone read this as applying to all tables (Kudu, HBase, etc) but it 
> only inherently applies to traditional non-transactional filesystem-based 
> tables. It also applies to transactional tables until IMPALA-8631 is fixed, 
> after which they will be more strongly consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML

2021-02-04 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10475:
---
Issue Type: Documentation  (was: Bug)

> SYNC_DDL docs should clarify that it only affects DML
> -
>
> Key: IMPALA-10475
> URL: https://issues.apache.org/jira/browse/IMPALA-10475
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Tim Armstrong
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html
> {noformat}
> Although INSERT is classified as a DML statement, when the SYNC_DDL option is 
> enabled, INSERT statements also delay their completion until all the 
> underlying data and metadata changes are propagated to all Impala nodes. 
> Internally, Impala inserts have similarities with DDL statements in 
> traditional database systems, because they create metadata needed to track 
> HDFS block locations for new files and they potentially add new partitions to 
> partitioned tables. 
> {noformat}
> I saw someone read this as applying to all tables (Kudu, HBase, etc) but it 
> only inherently applies to traditional non-transactional filesystem-based 
> tables. It also applies to transactional tables until IMPALA-8631 is fixed, 
> after which they will be more strongly consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML

2021-02-04 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10475:
--

 Summary: SYNC_DDL docs should clarify that it only affects DML
 Key: IMPALA-10475
 URL: https://issues.apache.org/jira/browse/IMPALA-10475
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Reporter: Tim Armstrong


https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html

{noformat}
Although INSERT is classified as a DML statement, when the SYNC_DDL option is 
enabled, INSERT statements also delay their completion until all the underlying 
data and metadata changes are propagated to all Impala nodes. Internally, 
Impala inserts have similarities with DDL statements in traditional database 
systems, because they create metadata needed to track HDFS block locations for 
new files and they potentially add new partitions to partitioned tables. 
{noformat}

I saw someone read this as applying to all tables (Kudu, HBase, etc) but it 
only inherently applies to traditional non-transactional filesystem-based 
tables. It also applies to transactional tables until IMPALA-8631 is fixed, 
after which they will be more strongly consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-4373) Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.

2021-02-03 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4373:
-

Assignee: (was: Tim Armstrong)

> Wrong results with correlated WHERE-clause subquery inside a NULL-checking 
> conditional function.
> 
>
> Key: IMPALA-4373
> URL: https://issues.apache.org/jira/browse/IMPALA-4373
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0
>Reporter: Alexander Behm
>Priority: Critical
>  Labels: correctness
>
> Impala may generate an incorrect plan for queries that have a correlated 
> scalar subquery as a parameter to a NULL-checking conditional function like 
> ISNULL().
> Example query and incorrect plan:
> {code}
> select t1.int_col
> from functional.alltypessmall as t1
> where t1.int_col >= isnull
> (
>(
> SELECT 
>  MAX(t2.bigint_col)
> FROM 
>  functional.alltypestiny AS t2 
> WHERE 
>  t1.id = t2.id + 1
> ),
>0  
> )
> Fetched 0 row(s) in 1.09s
> Single-node plan:
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=0B VCores=0   |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 03:HASH JOIN [LEFT SEMI JOIN] |
> | |  hash predicates: t1.id = t2.id + 1 |
> | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
> | |  runtime filters: RF000 <- t2.id + 1|
> | | |
> | |--02:AGGREGATE [FINALIZE]|
> | |  |  output: max(t2.bigint_col)  |
> | |  |  group by: t2.id |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypestiny t2]  |
> | | partitions=4/4 files=4 size=460B|
> | | |
> | 00:SCAN HDFS [functional.alltypessmall t1]|
> |partitions=4/4 files=4 size=6.32KB |
> |runtime filters: RF000 -> t1.id|
> +---+
> {code}
> The query returns an empty result set but instead should return all rows from 
> t1 because all invocations of the subquery return NULL, and all rows from t1 
> satisfy "t1.int_col >= 0".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-4373) Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.

2021-02-03 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278346#comment-17278346
 ] 

Tim Armstrong commented on IMPALA-4373:
---

The fix might be similar to IMPALA-10382 that [~xqhe] fixed recently.

> Wrong results with correlated WHERE-clause subquery inside a NULL-checking 
> conditional function.
> 
>
> Key: IMPALA-4373
> URL: https://issues.apache.org/jira/browse/IMPALA-4373
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0
>Reporter: Alexander Behm
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: correctness
>
> Impala may generate an incorrect plan for queries that have a correlated 
> scalar subquery as a parameter to a NULL-checking conditional function like 
> ISNULL().
> Example query and incorrect plan:
> {code}
> select t1.int_col
> from functional.alltypessmall as t1
> where t1.int_col >= isnull
> (
>(
> SELECT 
>  MAX(t2.bigint_col)
> FROM 
>  functional.alltypestiny AS t2 
> WHERE 
>  t1.id = t2.id + 1
> ),
>0  
> )
> Fetched 0 row(s) in 1.09s
> Single-node plan:
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=0B VCores=0   |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 03:HASH JOIN [LEFT SEMI JOIN] |
> | |  hash predicates: t1.id = t2.id + 1 |
> | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
> | |  runtime filters: RF000 <- t2.id + 1|
> | | |
> | |--02:AGGREGATE [FINALIZE]|
> | |  |  output: max(t2.bigint_col)  |
> | |  |  group by: t2.id |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypestiny t2]  |
> | | partitions=4/4 files=4 size=460B|
> | | |
> | 00:SCAN HDFS [functional.alltypessmall t1]|
> |partitions=4/4 files=4 size=6.32KB |
> |runtime filters: RF000 -> t1.id|
> +---+
> {code}
> The query returns an empty result set but instead should return all rows from 
> t1 because all invocations of the subquery return NULL, and all rows from t1 
> satisfy "t1.int_col >= 0".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10470) Update wiki with info about Impala quickstart

2021-02-02 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10470:
--

 Summary: Update wiki with info about Impala quickstart
 Key: IMPALA-10470
 URL: https://issues.apache.org/jira/browse/IMPALA-10470
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Tim Armstrong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10469) Support pushing quickstart images to Apache repo

2021-02-02 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10469:
--

 Summary: Support pushing quickstart images to Apache repo
 Key: IMPALA-10469
 URL: https://issues.apache.org/jira/browse/IMPALA-10469
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Reporter: Tim Armstrong


We need a naming scheme and maybe a script to do the push. We've so far assumed 
a different repository for each image, but in the Apache docker, we only have a 
single repository and need to encode the image type and version into the tag

See  https://hub.docker.com/repository/docker/apache/kudu for an example.

They have:
apache/kudu:
apache/kudu:kudu-python-
apache/kudu:impala-latest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6452) RegEx option support for regexp_extract

2021-02-02 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6452:
-

Assignee: (was: Tim Armstrong)

> RegEx option support for regexp_extract
> ---
>
> Key: IMPALA-6452
> URL: https://issues.apache.org/jira/browse/IMPALA-6452
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Harsh J
>Priority: Minor
>  Labels: ramp-up
>
> Impala's {{regexp_like}} supports passing options that enable newline and 
> multi-line matching patterns. The same isn't supported for 
> {{regexp_extract}}, forcing users to resort to using {{split_part}} or other 
> techniques that work with newline characters in a string.
>  
> Please consider supporting options similar to those available in 
> {{regexp_like}} in {{regexp_extract}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9690) Bump minimum x86-64 CPU requirements

2021-02-01 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9690:
-

Assignee: (was: Tim Armstrong)

> Bump minimum x86-64 CPU requirements
> 
>
> Key: IMPALA-9690
> URL: https://issues.apache.org/jira/browse/IMPALA-9690
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Blocker
>  Labels: performance
>
> We still have a minimum CPU requirement of SSSE3 support 
> https://impala.apache.org/docs/build/html/topics/impala_prereqs.html. I.e. we 
> don't assume SSE4.2 or AVX or AVX2.
> There is a lot of legacy code to support CPUs without SSE4.2 and various 
> other extensions. As a start, here are all the locations in the code where we 
> branch based on CPU feature:
> {noformat}
> :~/impala/impala$ git grep CpuInfo::IsSupport
> be/src/benchmarks/int-hash-benchmark.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_2)) suite32.BENCH(uint32_t, CRC);
> be/src/benchmarks/int-hash-benchmark.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_2)) {
> be/src/benchmarks/int-hash-benchmark.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_1)) {
> be/src/benchmarks/int-hash-benchmark.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> be/src/benchmarks/string-compare-benchmark.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_2)) {
> be/src/exec/delimited-text-parser.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_2)) {
> be/src/exec/delimited-text-parser.cc:  if 
> (CpuInfo::IsSupported(CpuInfo::SSE4_2)) {
> be/src/exec/delimited-text-parser.inline.h:  
> DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/exec/delimited-text-parser.inline.h:  if 
> (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) {
> be/src/runtime/io/disk-io-mgr.cc:  if 
> (!CpuInfo::IsSupported(CpuInfo::SSE4_2)) {
> be/src/util/bit-util-test.cc:  if (CpuInfo::IsSupported(CpuInfo::SSSE3)) {
> be/src/util/bit-util-test.cc:  if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> be/src/util/bit-util-test.cc:  if (CpuInfo::IsSupported(cpu_info_flag)) {
> be/src/util/bit-util-test.cc:// CpuInfo::IsSupported() checks. This doesn't 
> test the bug precisely but is a canary for
> be/src/util/bit-util.cc:if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> be/src/util/bit-util.cc:} else if 
> (LIKELY(CpuInfo::IsSupported(CpuInfo::SSSE3))) {
> be/src/util/bit-util.cc:if (LIKELY(CpuInfo::IsSupported(CpuInfo::SSSE3))) 
> {
> be/src/util/bit-util.h:if (LIKELY(CpuInfo::IsSupported(CpuInfo::POPCNT))) 
> {
> be/src/util/bloom-filter.cc:  if (CpuInfo::IsSupported(CpuInfo::AVX)) {
> be/src/util/bloom-filter.h:  if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> be/src/util/bloom-filter.h:  if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
> be/src/util/cpu-info.cc:  if (!CpuInfo::IsSupported(CpuInfo::SSSE3)) {
> be/src/util/cpu-info.h:  ///   // line, CpuInfo::IsSupported(CpuInfo::AVX2) 
> will return false.
> be/src/util/cpu-info.h:  : feature_(feature), 
> reenable_(CpuInfo::IsSupported(feature)) {
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2));
> be/src/util/hash-util.h:if 
> (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) {
> be/src/util/openssl-util.cc:  return 
> (CpuInfo::IsSupported(CpuInfo::PCLMULQDQ)
> {noformat}
> We also ship two versions of the codegen module, one of which (nosse42) is 
> essentially never used.
> I think it would be uncontroversial to bump the minimum requirement to 
> SSE4.2, which would allow us to delete some old fallbacks. I think the last 
> time Intel or AMD shipped a processor without this was 2010 or 2011. Jumping 
> to AVX is probably almost as uncontroversial, since it looks like that has 
> been universal for nearly as long: 
> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX.
> Some older lower-performance cloud instance types don't support AVX, it looks 
> like, but I think this is an edge case. 
> It would be very nice to require AVX2 because that could remove a bunch of 
> conditional code (I think the contributors adding ARM support might want to 
> keep the scalar fallback code though, potentially). It looks like most Intel 
> and AMD processors have supported it since 2013 and 2015 respectively, except 
> some low-end Intel processors: 
>

[jira] [Assigned] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2021-02-01 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2138:
-

Assignee: (was: Tim Armstrong)

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Major
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch, performance_result.txt
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2268) implicit casting of string to timestamp for functions

2021-01-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2268:
-

Assignee: (was: Tim Armstrong)

> implicit casting of string to timestamp for functions
> -
>
> Key: IMPALA-2268
> URL: https://issues.apache.org/jira/browse/IMPALA-2268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2
>Reporter: Bharath Vissapragada
>Priority: Minor
>  Labels: newbie, usability
>
> Consider date_add() builtin. string is automatically cast to a timestamp.
> {code}
> select date_add( "1900-01-01", 1 ) ; 
> Query: select date_add( "1900-01-01", 1 ) 
> +---+ 
> | date_add('1900-01-01', 1) | 
> +---+ 
> | 1900-01-02 00:00:00 | 
> +---+ 
> Fetched 1 row(s) in 0.12s 
> {code}
> However with an "interval"
> {code}
> select date_add( '1900-01-01', interval 72 days ) ; 
> Query: select date_add( '1900-01-01', interval 72 days ) 
> ERROR: AnalysisException: Operand ''1900-01-01'' of timestamp arithmetic 
> expression 'DATE_ADD('1900-01-01', INTERVAL 72 days)' returns type 'STRING'. 
> Expected type 'TIMESTAMP'. 
> {code}
> We need to manually cast it to a timestamp, something like,
> {code}
> select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) ; 
> Query: select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) 
> +-+ 
> | date_add(cast('1900-01-01' as timestamp), interval 10 days) | 
> +-+ 
> | 1900-01-11 00:00:00 | 
> +-+ 
> Fetched 1 row(s) in 0.02s 
> {code}
> Its convenient to make this behavior consistent across all builtins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification

2021-01-29 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8306.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Debug WebUI's Sessions page verbiage clarification
> --
>
> Key: IMPALA-8306
> URL: https://issues.apache.org/jira/browse/IMPALA-8306
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Vincent Tran
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: supportability
> Fix For: Impala 4.0
>
> Attachments: sessions.png
>
>
> Currently, the Debug WebUI's Sessions page captures both active sessions and 
> expired sessions. On the top of the page there is a message along the line of:
> {noformat}
> There are {{num_sessions}} sessions, of which {{num_active}} are active. 
> Sessions may be closed either when they are idle for some time (see Idle 
> Timeout
> below), or if they are deliberately closed, otherwise they are called active.
> {noformat}
> This text is ambiguous for me. If all non-active sessions are expired 
> sessions, it should explicitly tell the user that. And since an active 
> session becomes an expired session when it breaches the Session Idle Timeout, 
> the second sentence is also somewhat misleading. User has to "deliberately 
> close" both active sessions and expired sessions to close them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-1652) Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed column.

2021-01-27 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273369#comment-17273369
 ] 

Tim Armstrong commented on IMPALA-1652:
---

[~stigahuang] it does require some additional logic in the CHAR->STRING cast - 
https://gerrit.cloudera.org/#/c/16339/3/be/src/exprs/cast-functions-ir.cc. But 
it doesn't require additional casts or any additional copies. I haven't 
measured but I think it would be very cheap in practice.

> Maybe we can store the actual length of each CHAR value in the tuple layout, 
> and calculte the actual length once when materializing the value
I think if we're going to do change the slot layout, it's probably easier to 
treat it as variable-length and store it in a StringValue.

> Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed 
> column.
> ---
>
> Key: IMPALA-1652
> URL: https://issues.apache.org/jira/browse/IMPALA-1652
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.1, Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: correctness, downgraded, usability
> Attachments: 8be18d4.diff
>
>
> Repro:
> {code}
> create table foo(col1 char(10));
> insert into foo values (cast('test1' as char(10)));
> select * from foo where col1 = 'test1'; <-- returns an empty result set
> select * from foo where col1 = cast('test1' as char(10)); <-- correctly 
> returns 1 row
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10404) Update docs to reflect RLE_DICTIONARY support

2021-01-26 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10404.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Update docs to reflect RLE_DICTIONARY support
> -
>
> Key: IMPALA-10404
> URL: https://issues.apache.org/jira/browse/IMPALA-10404
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9726) Update boilerplate in the PyPI sidebar for impala-shell supported versions

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9726:
-

Assignee: (was: Tim Armstrong)

> Update boilerplate in the PyPI sidebar for impala-shell supported versions
> --
>
> Key: IMPALA-9726
> URL: https://issues.apache.org/jira/browse/IMPALA-9726
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Affects Versions: Impala 4.0
>Reporter: David Knupp
>Priority: Minor
>
> The following lines need to be updated to reflect that the shell now supports 
> python 2.7+ and 3+.
> https://github.com/apache/impala/blob/master/shell/packaging/setup.py#L164-167
> {noformat}
> 'Programming Language :: Python :: 2 :: Only',
> 'Programming Language :: Python :: 2.6',
> 'Programming Language :: Python :: 2.7',
> {noformat}
> Note that this has no effect on the actual installation. The following line 
> is what manages that, and its value is correct for both Impala 3.4.0 and 
> Impala 4.0:
> https://github.com/apache/impala/blob/master/shell/packaging/setup.py#L138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8306 started by Tim Armstrong.
-
> Debug WebUI's Sessions page verbiage clarification
> --
>
> Key: IMPALA-8306
> URL: https://issues.apache.org/jira/browse/IMPALA-8306
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Vincent Tran
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: supportability
> Attachments: sessions.png
>
>
> Currently, the Debug WebUI's Sessions page captures both active sessions and 
> expired sessions. On the top of the page there is a message along the line of:
> {noformat}
> There are {{num_sessions}} sessions, of which {{num_active}} are active. 
> Sessions may be closed either when they are idle for some time (see Idle 
> Timeout
> below), or if they are deliberately closed, otherwise they are called active.
> {noformat}
> This text is ambiguous for me. If all non-active sessions are expired 
> sessions, it should explicitly tell the user that. And since an active 
> session becomes an expired session when it breaches the Session Idle Timeout, 
> the second sentence is also somewhat misleading. User has to "deliberately 
> close" both active sessions and expired sessions to close them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification

2021-01-25 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271701#comment-17271701
 ] 

Tim Armstrong commented on IMPALA-8306:
---

[~thundergun] I had a go at improving this here - 
http://gerrit.cloudera.org:8080/16981. Would welcome your feedback.

> Debug WebUI's Sessions page verbiage clarification
> --
>
> Key: IMPALA-8306
> URL: https://issues.apache.org/jira/browse/IMPALA-8306
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Vincent Tran
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: supportability
> Attachments: sessions.png
>
>
> Currently, the Debug WebUI's Sessions page captures both active sessions and 
> expired sessions. On the top of the page there is a message along the line of:
> {noformat}
> There are {{num_sessions}} sessions, of which {{num_active}} are active. 
> Sessions may be closed either when they are idle for some time (see Idle 
> Timeout
> below), or if they are deliberately closed, otherwise they are called active.
> {noformat}
> This text is ambiguous for me. If all non-active sessions are expired 
> sessions, it should explicitly tell the user that. And since an active 
> session becomes an expired session when it breaches the Session Idle Timeout, 
> the second sentence is also somewhat misleading. User has to "deliberately 
> close" both active sessions and expired sessions to close them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3657) Permission upon insert are wrong in hive warehouse table files

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3657.
---
Resolution: Not A Bug

> Permission upon insert are wrong in hive warehouse table files
> --
>
> Key: IMPALA-3657
> URL: https://issues.apache.org/jira/browse/IMPALA-3657
> Project: IMPALA
>  Issue Type: Bug
>  Components: Security
>Affects Versions: Impala 2.2.3
> Environment: Cluster is Kerberized and has sentry
>Reporter: Bala Chander
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: security
>
> Found an issue with permissions on warehouse.
> The Warehouse /user/hive/warehouse was set to owner hive:hive with 771 
> permissions recursively. User was granted write privilege on table (tbl-1) on 
> database (db-1).
> Initially all grants were done with beeline.
> Next the user switched to impala-shell and inserted some data into tbl-1. The 
> permissions on the new hdfs file was the following:
> ownership :  impala:hive
> permissions:  751 i.e. read and execute on group.
> The user cannot use insert overwrite via beeline sine the group hive has read 
> only permissions.
> The documentation: 
> http://www.cloudera.com/documentation/enterprise/latest/topics/impala_insert.html
>  has the following:
> Related startup options:
> By default, if an INSERT statement creates any new subdirectories underneath 
> a partitioned table, those subdirectories are assigned default HDFS 
> permissions for the impala user. To make each subdirectory have the same 
> permissions as its parent directory in HDFS, specify the 
> --insert_inherit_permissions startup option for the impalad daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-3657) Permission upon insert are wrong in hive warehouse table files

2021-01-25 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271655#comment-17271655
 ] 

Tim Armstrong commented on IMPALA-3657:
---

This is controlled by the fs.permissions.umask-mode setting in hdfs-site.xml, 
which defaults to 022. It could make sense to change it to 002 if you're in a 
setup like this where Impala is in the hive group. This is probably not 
something that needs to be fixed in Apache Impala, but rather in management 
software that sets up users/groups etc.

> Permission upon insert are wrong in hive warehouse table files
> --
>
> Key: IMPALA-3657
> URL: https://issues.apache.org/jira/browse/IMPALA-3657
> Project: IMPALA
>  Issue Type: Bug
>  Components: Security
>Affects Versions: Impala 2.2.3
> Environment: Cluster is Kerberized and has sentry
>Reporter: Bala Chander
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: security
>
> Found an issue with permissions on warehouse.
> The Warehouse /user/hive/warehouse was set to owner hive:hive with 771 
> permissions recursively. User was granted write privilege on table (tbl-1) on 
> database (db-1).
> Initially all grants were done with beeline.
> Next the user switched to impala-shell and inserted some data into tbl-1. The 
> permissions on the new hdfs file was the following:
> ownership :  impala:hive
> permissions:  751 i.e. read and execute on group.
> The user cannot use insert overwrite via beeline sine the group hive has read 
> only permissions.
> The documentation: 
> http://www.cloudera.com/documentation/enterprise/latest/topics/impala_insert.html
>  has the following:
> Related startup options:
> By default, if an INSERT statement creates any new subdirectories underneath 
> a partitioned table, those subdirectories are assigned default HDFS 
> permissions for the impala user. To make each subdirectory have the same 
> permissions as its parent directory in HDFS, specify the 
> --insert_inherit_permissions startup option for the impalad daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-1652) Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed column.

2021-01-25 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271638#comment-17271638
 ] 

Tim Armstrong commented on IMPALA-1652:
---

I had a WIP here - https://gerrit.cloudera.org/#/c/16339/ that illustrated how 
it might be possible to tweak

> Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed 
> column.
> ---
>
> Key: IMPALA-1652
> URL: https://issues.apache.org/jira/browse/IMPALA-1652
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.1, Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: correctness, downgraded, usability
> Attachments: 8be18d4.diff
>
>
> Repro:
> {code}
> create table foo(col1 char(10));
> insert into foo values (cast('test1' as char(10)));
> select * from foo where col1 = 'test1'; <-- returns an empty result set
> select * from foo where col1 = cast('test1' as char(10)); <-- correctly 
> returns 1 row
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2021-01-25 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271635#comment-17271635
 ] 

Tim Armstrong commented on IMPALA-2138:
---

Abandoned - https://gerrit.cloudera.org/#/c/14216/  
https://gerrit.cloudera.org/#/c/14399/1

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch, performance_result.txt
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7885) Create function to convert to ts from unix millis

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7885:
--
Labels: ramp-up  (was: )

> Create function to convert to ts from unix millis
> -
>
> Key: IMPALA-7885
> URL: https://issues.apache.org/jira/browse/IMPALA-7885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: eugen yushin
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: ramp-up
>
> There're several functions like 
> `from_unixtime`/`unix_micros_to_utc_timestamp`/`to_timestamp` in Impala which 
> accepts seconds and micros, but none of them works with millis.
> At the same time, Impala already has all necessary utility methods to add 
> such a functionality:
> [https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-value.inline.h#L54]
> {code}
> inline TimestampValue TimestampValue::UtcFromUnixTimeMillis(int64_t 
> unix_time_millis) {
>  return UtcFromUnixTimeTicks(unix_time_millis);
> }
> {code}
> https://github.com/apache/impala/blob/master/be/src/exprs/timestamp-functions-ir.cc#L141
> {code}
> TimestampVal TimestampFunctions::UnixMicrosToUtcTimestamp(FunctionContext* 
> context,
> const BigIntVal& unix_time_micros) {
>   if (unix_time_micros.is_null) return TimestampVal::null();
>   TimestampValue tv = 
> TimestampValue::UtcFromUnixTimeMicros(unix_time_micros.val);
>   TimestampVal result;
>   tv.ToTimestampVal();
>   return result;
> }
> {code}
> It would be better to have Unix millis to timestamp conversion function as 
> buit-in functionality to prevent from:
> - creating cumbersome 'aliases' like:
> {code}
> select unix_micros_to_utc_timestamp(1513895588243 * 1000)
> {code}
> or
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Why-not-from-unixtime-function-handles-an-unix-timestamp-in/m-p/63182#M3969
> {code}
> select cast(1513895588243 div 1000 as timestamp) + interval (1513895588243 % 
> 1000) milliseconds;
> {code}
> - writing relatively slow udfs in java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9457) Lazy start of disk threads in I/O manager

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9457:
-

Assignee: (was: Tim Armstrong)

> Lazy start of disk threads in I/O manager
> -
>
> Key: IMPALA-9457
> URL: https://issues.apache.org/jira/browse/IMPALA-9457
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: ramp-up, supportability
>
> Currently DiskIoMgr starts all the I/O threads upfront for all supported 
> filesystems. This means there are 100s of idle threads in most impalads that 
> never do anything. It would be sensible to start the threads for a disk only 
> when the first range is submitted. It's not immediately obvious where the 
> best place to do this is. A couple of ideas:
> * Try to do it in ScheduleContext in a lightweight way, e.g. check an atomic 
> to see if it's been initialised, then acquire a lock and create the threads 
> if needed.  Propagating the status if thread creation fails may be the tricky 
> part
> * Start up one thread per disk, so I/O can always make progress, and start an 
> extra thread per disk each time a range is pulled off the queue in 
> DiskQueue::GetNextRequestRange() so that the number of threads ramps up as 
> scan ranges are submitted. It could potentially be clever and try to track 
> how many threads are parked and only create new threads if 0 threads are 
> parked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10229) Analytic limit pushdown optimization can be applied incorrectly based on predicates present

2021-01-25 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10229.

Fix Version/s: Impala 4.0
   Resolution: Fixed

Finished both subtasks

> Analytic limit pushdown optimization can be applied incorrectly based on 
> predicates present
> ---
>
> Key: IMPALA-10229
> URL: https://issues.apache.org/jira/browse/IMPALA-10229
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0
>
>
> {noformat}
> [localhost.EXAMPLE.COM:21050] default> select * from (select month, id, 
> rank() over (partition by month order by id desc) rnk from 
> functional_parquet.alltypes WHERE month >= 11) v order by month, id limit 3;
> +---+--+-+
> | month | id   | rnk |
> +---+--+-+
> | 11| 6987 | 3   |
> | 11| 6988 | 2   |
> | 11| 6989 | 1   |
> +---+--+-+
> Fetched 3 row(s) in 4.16s
> {noformat}
> These are not the top 3 rows when ordering by month, id . Hive's result is 
> correct:
> {noformat}
> +--+---++
> | v.month  | v.id  | v.rnk  |
> +--+---++
> | 11   | 3040  | 600|
> | 11   | 3041  | 599|
> | 11   | 3042  | 598|
> +--+---++
> {noformat}
> I think when there's no select predicates, that the ordering in the analytic 
> sort needs to exactly match the TOP N sort ordering. I'm not sure if there 
> are fixes needed for the case where there are select predicates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg

2021-01-25 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-10453:
--

 Summary: Support file/partition pruning via runtime filters on 
Iceberg
 Key: IMPALA-10453
 URL: https://issues.apache.org/jira/browse/IMPALA-10453
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong


This is a placeholder to figure out what we'd need to do to support dynamic 
file-level pruning in Iceberg using runtime filters, i.e. have parity for 
partition pruning.

* If there is a single partition value per file, then applying bloom filters to 
the row group stats would be effective at pruning files.
* If there are partition transforms, e.g. hash-based, then I think we probably 
need to track the partition that the file is associated with and then have some 
custom logic in the parquet scanner to do partition pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10296) Fix analytic limit pushdown when predicates are present

2021-01-24 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10296.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Fix analytic limit pushdown when predicates are present
> ---
>
> Key: IMPALA-10296
> URL: https://issues.apache.org/jira/browse/IMPALA-10296
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0
>
>
> This is to fix case 1 of the parent JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10448) observability/test_profile_tool.py fails missing impala-profile-tool during Docker-based tests

2021-01-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268289#comment-17268289
 ] 

Tim Armstrong commented on IMPALA-10448:


Looks like I missed adding this in there.

> observability/test_profile_tool.py fails missing impala-profile-tool during 
> Docker-based tests
> --
>
> Key: IMPALA-10448
> URL: https://issues.apache.org/jira/browse/IMPALA-10448
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Laszlo Gaal
>Priority: Major
>
> The test executable {{impala-profile-tool}} is missing during the execution 
> of the test suite EE_TEST_PARALLEL. This is specific to Docker-based tests 
> (driven by docker/test-with-docker-py), because in that environment test 
> executables are built only in the container execuring the suite BE_TEST. All 
> other containers receive only the core Impala binaries as built by
> {code}
> buildall.sh -notests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9458) Improve runtime profile counters for slow IO from remote stores

2021-01-15 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9458:
-

Assignee: (was: Tim Armstrong)

> Improve runtime profile counters for slow IO from remote stores
> ---
>
> Key: IMPALA-9458
> URL: https://issues.apache.org/jira/browse/IMPALA-9458
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: observability
>
> Remote storage systems (e.g. cloud stores like S3 and ABFS) often have long 
> tail latencies. Most I/O finishes relatively quickly, but some calls make 
> take significantly longer. Even for HDFS, this is an issue (e.g. hedged reads 
> were developed to help mitigate tail latencies, although no such feature 
> exists for cloud storage connectors).
> Currently, scan nodes just track the total amount of time spent reading data. 
> It would be good to have a summary stats counter that tracks the min, avg, 
> and max time spent reading data. This should at least allow us to identify 
> when calls to remote storage services are taking longer than usual.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10153) Support time travel for Iceberg tables

2021-01-15 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266219#comment-17266219
 ] 

Tim Armstrong commented on IMPALA-10153:


WIth Kudu my understanding is that you could do temporal queries back until the 
ancient history marker, beyond which point per-row timestamps are no longer 
maintained - 
https://github.com/cloudera/kudu/blob/master/docs/design-docs/tablet-history-gc.md

> Support time travel for Iceberg tables
> --
>
> Key: IMPALA-10153
> URL: https://issues.apache.org/jira/browse/IMPALA-10153
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg tables support snapshots/data versioning/time travel.
> It means we can query an older version of the table.
> Probably we'll need to extend Impala's SQL syntax to support such queries 
> (Hive will also support such queries, so we should use the same syntax).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10153) Support time travel for Iceberg tables

2021-01-15 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266218#comment-17266218
 ] 

Tim Armstrong commented on IMPALA-10153:


[~patrickangeles] IMPALA-9773 is the JIRA

> Support time travel for Iceberg tables
> --
>
> Key: IMPALA-10153
> URL: https://issues.apache.org/jira/browse/IMPALA-10153
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg tables support snapshots/data versioning/time travel.
> It means we can query an older version of the table.
> Probably we'll need to extend Impala's SQL syntax to support such queries 
> (Hive will also support such queries, so we should use the same syntax).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9865) Utility to pretty-print thrift profiles at various levels

2021-01-14 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9865.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Utility to pretty-print thrift profiles at various levels
> -
>
> Key: IMPALA-9865
> URL: https://issues.apache.org/jira/browse/IMPALA-9865
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: image-2020-11-20-09-25-08-082.png
>
>
> The prototyping work in IMPALA-9382 revealed some hard trade-offs between 
> having a full-fidelity text profile and readability.
> We want to have a text profile with less information by default so that it is 
> more readable, and rely on the thrift profile for more detailed debugging. 
> This would be easier if we provided a utility that can pretty-print a thrift 
> profile at different levels of detail.
> This JIRA is to reduce the default level of pretty-printing for aggregated 
> profiles, but provide a utility that can dump both the basic and full 
> versions. My thought is to start off with the same 4 levels as explain, but 
> maybe only implement 2 levels to start off with - basic and extended.
> The utility should be able to handle the same cases as 
> bin/parse-thrift-profile.py (profile log and single profile in a file) and 
> maybe print only a specified query from a  profile log.  We can use the 
> DeserializeFromArchiveString() method that was removed in IMPALA-9381, then 
> pretty-print the deserialised profile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9486) Creating a Kudu table via JDBC fails with "IllegalArgumentException"

2021-01-11 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262890#comment-17262890
 ] 

Tim Armstrong commented on IMPALA-9486:
---

I wonder if IMPALA-10027 would prevent this in a different way, at least if the 
root cause is that an unauthenticated connection doesn't have a valid user set.

> Creating a Kudu table via JDBC fails with "IllegalArgumentException"
> 
>
> Key: IMPALA-9486
> URL: https://issues.apache.org/jira/browse/IMPALA-9486
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Grant Henke
>Assignee: Fang-Yu Rao
>Priority: Blocker
>
> A Kudu user reported that though creating tables via impala shell or Hue, 
> when using an external tool connected via JDBC the create statement fails 
> with the following:
> {noformat}
> [ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, 
> SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:ImpalaRuntimeException: Error creating Kudu table 
> 'impala::default.foo' CAUSED BY: IllegalArgumentException: table owner must 
> not be null or empty ), Query: …
> {noformat}
>  
> When debugging the issue further it looks like the call to set the owner on 
> the Kudu table should not be called if an owner is not explicitly set:
> [https://github.com/apache/impala/blob/497a17dbdc0669abd47c2360b8ca94de8b54d413/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L252]
>  
> A possible fix could be to guard the call with _isSetOwner_:
> {code:java}
> if (msTbl.isSetOwner()) { 
>tableOpts.setOwner(msTbl.getOwner()); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9958) Implement Introsort by adding a heapsort case

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9958.
---
Resolution: Won't Do

This isn't an obvious win - we do a randomized median of three pivot selection 
that's fairly robust. I think we should look at the sort holistically instead 
of assuming this is the right solution.

> Implement Introsort by adding a heapsort case 
> --
>
> Key: IMPALA-9958
> URL: https://issues.apache.org/jira/browse/IMPALA-9958
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Shant Hovsepian
>Priority: Minor
>
> Introsort is the standard hybrid sort implementation 
> [https://en.wikipedia.org/wiki/Introsort] which chooses between quicksort, 
> heapsort, and insertion sort given the current sort run size.
>  
> Currently the Sorter uses quicksort with insertion sort for batches smaller 
> than 16. With introsort in cases where the quisksort partitions the data 
> above a threshold 2*log(N), then the algorithm switches to using heapsort.
> This should help mitigate worse case pivot selections.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8900) Allow /healthz access without authentication

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8900.
---
Resolution: Duplicate

> Allow /healthz access without authentication
> 
>
> Key: IMPALA-8900
> URL: https://issues.apache.org/jira/browse/IMPALA-8900
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Lars Volker
>Priority: Major
>
> When enabling SPNEGO authentication for the debug webpages, /healthz becomes 
> unavailable. Some tooling might rely on the endpoint being accessible without 
> authentication and it does not pose a security risk to make it available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7427) Write Impala version information to writer.model.name footer field of Parquet

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7427:
--
Labels: newbie parquet ramp-up  (was: parquet)

> Write Impala version information to writer.model.name footer field of Parquet
> -
>
> Key: IMPALA-7427
> URL: https://issues.apache.org/jira/browse/IMPALA-7427
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltan Ivanfi
>Priority: Minor
>  Labels: newbie, parquet, ramp-up
>
> PARQUET-352 added support for the "writer.model.name" property in the Parquet 
> metadata to identify the object model (application) that wrote the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-6979) BloomFilterBenchmark hits DCHECK

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6979.
---
Resolution: Later

Don't need to track this, if someone wants to use the benchmark they'll need to 
fix it.

> BloomFilterBenchmark hits DCHECK
> 
>
> Key: IMPALA-6979
> URL: https://issues.apache.org/jira/browse/IMPALA-6979
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tianyi Wang
>Priority: Minor
>
> Leaving this here in case someone else runs into it and needs to fix the 
> benchmark. We don't run this benchmark as part of builds so it's not a high 
> priority to fix.
> {noformat}
> F0504 15:18:55.533821 26709 bloom-filter.cc:192] Check failed: 
> !out->always_false 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-6895) Reduce flush, close and open calls in SimpleLogger::flush()

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6895.
---
Resolution: Won't Do

> Reduce flush, close and open calls in SimpleLogger::flush()
> ---
>
> Key: IMPALA-6895
> URL: https://issues.apache.org/jira/browse/IMPALA-6895
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Zoram Thanga
>Priority: Minor
>  Labels: ramp-up
>
> Currently, SimpleLogger provides a Flush() interface which is used by its 
> client(s) to periodically (hard-coded to 5 seconds) flush the log file. We 
> could eliminate these flush threads by keeping track of last flush time, and 
> have the caller of SimpleLogger::AppendEntry() flush on demand (now - 
> last_flush_time >= 5 seconds or whatever).
> This has the added benefit of reducing contention on the 
> SimpleLogger::log_file_lock_ mutex to just between the threads adding entries 
> to the log file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-6555) Clean up relationship between DiskIoMgr::min_buffer_size_ and BufferPool::min_buffer_len_

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6555.
---
Resolution: Later

> Clean up relationship between DiskIoMgr::min_buffer_size_ and 
> BufferPool::min_buffer_len_
> -
>
> Key: IMPALA-6555
> URL: https://issues.apache.org/jira/browse/IMPALA-6555
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Minor
>
> They are always the same value in practice, obtained from --min_buffer_size. 
> We should probably get rid of DiskIoMgr::min_buffer_size_ and fix up all 
> references to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-6344) Optimize decimal multiplication

2021-01-08 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6344.
---
Resolution: Later

> Optimize decimal multiplication
> ---
>
> Key: IMPALA-6344
> URL: https://issues.apache.org/jira/browse/IMPALA-6344
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Taras Bobrovytsky
>Priority: Major
>  Labels: decimal, perf
>
> Our current implementation of decimal multiplication can be slow and 
> non-optimal due to having branches in our code.
> [~zamsden] suggested to use 
> [https://en.wikipedia.org/wiki/Karatsuba_algorithm] multiplication for int128 
> * int128 -> int256 multiply. The following example implements this and uses 3 
> hardware 64-bit multiplies to get a full 256 bit result. The code is written 
> in inline assembly and has no branches.
> http://coliru.stacked-crooked.com/a/25a697389211189f
> We should consider benchmarking this code and using this approach if it turns 
> out to be faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5902 matches

Mail list logo