date:20191031

[jira] [Created] (IMPALA-9116) SASL server fails when FQDN is greater than 63 characters long in Kudu RPC

2019-10-31 Thread Anurag Mantripragada (Jira)

Anurag Mantripragada created IMPALA-9116:


 Summary: SASL server fails when FQDN is greater than 63 characters 
long in Kudu RPC
 Key: IMPALA-9116
 URL: https://issues.apache.org/jira/browse/IMPALA-9116
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Anurag Mantripragada
 Fix For: Impala 3.4.0


In the current Kudu RPC implementation, we don't explicitly pass the host's 
FQDN into the SASL library. Due to an upstream SASL bug 
([https://github.com/cyrusimap/cyrus-sasl/issues/583]) the FQDN gets truncated 
when trying to determine the server's principal, in the case that the server's 
fQDN is longer than 64 characters.

This results in startup failures where the preflight checks fail due to not 
finding the appropriate keytab entry (after searching for a truncated host name)

To work around this, we should use our own code to compute the FQDN.

Kudu is making the changes in it's own implementation here:

https://issues.apache.org/jira/browse/KUDU-2989, we should do the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9116) SASL server fails when FQDN is greater than 63 characters long in Kudu RPC

2019-10-31 Thread Anurag Mantripragada (Jira)

Anurag Mantripragada created IMPALA-9116:


 Summary: SASL server fails when FQDN is greater than 63 characters 
long in Kudu RPC
 Key: IMPALA-9116
 URL: https://issues.apache.org/jira/browse/IMPALA-9116
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Anurag Mantripragada
 Fix For: Impala 3.4.0


In the current Kudu RPC implementation, we don't explicitly pass the host's 
FQDN into the SASL library. Due to an upstream SASL bug 
([https://github.com/cyrusimap/cyrus-sasl/issues/583]) the FQDN gets truncated 
when trying to determine the server's principal, in the case that the server's 
fQDN is longer than 64 characters.

This results in startup failures where the preflight checks fail due to not 
finding the appropriate keytab entry (after searching for a truncated host name)

To work around this, we should use our own code to compute the FQDN.

Kudu is making the changes in it's own implementation here:

https://issues.apache.org/jira/browse/KUDU-2989, we should do the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Manish Maheshwari (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964611#comment-16964611
 ] 

Manish Maheshwari commented on IMPALA-3933:
---

So can we do something about this? I know not too many users will be using such 
old dates, but some do and this becomes one of the reasons to use Hive over 
Impala for BI queries.

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 10:34:45 |
> | 6| 1949-04-15 10:34:45 |
> | 7| 1753-04-15 11:34:45 |
> | 8| 1752-04-15 11:34:45 |
> +--+-+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 12:34:45 |
> | 6| 1949-04-15 12:34:45 |
> | 7| 1753-04-15 12:51:05 |
> | 8| 1752-04-15 12:51:05 |
> +--+-+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9032) Impala returns 0 rows over hs2-http without waiting for fetch_rows_timeout_ms timeout

2019-10-31 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964478#comment-16964478
 ] 

Sahil Takiar commented on IMPALA-9032:
--

The bug is here: 
[https://github.com/apache/impala/commit/151835116a7972b15a646f8eae6bd8a593bb3564#diff-56ca691d07bb5e79ea7a99aa180cbf91R136-R137]
 - {{PlanRootSink::fetch_rows_timeout_us()}} is in microseconds, but 
{{wait_timeout_timer.ElapsedTime()}} is in nanoseconds. This was fixed here: 
[https://github.com/apache/impala/commit/c47fca5960b5be1a8e2013c4c4ffe260e98a1bff#diff-56ca691d07bb5e79ea7a99aa180cbf91R143-R145]

> Impala returns 0 rows over hs2-http without waiting for fetch_rows_timeout_ms 
> timeout
> -
>
> Key: IMPALA-9032
> URL: https://issues.apache.org/jira/browse/IMPALA-9032
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Lars Volker
>Priority: Major
>
> This looks like a bug to me but I'm not entirely sure. I'm trying to run our 
> tests over hs2-http (IMPALA-8863) and after the change for IMPALA-7312 to 
> introduce a non-blocking mode for FetchResults() it looks like we sometimes 
> return an empty result way before {{fetch_rows_timeout_ms}} has elapsed. This 
> triggers a bug in Impyla 
> ([#369|https://github.com/cloudera/impyla/issues/369]), but it also seems 
> like something we should investigate and fix in Impala.
> {noformat}
> I1007 22:10:10.697760 56550 impala-hs2-server.cc:821] FetchResults(): 
> query_id=764d4313dbc64e20:2831560c fetch_size=1024I1007 
> 22:10:10.697760 56550 impala-hs2-server.cc:821] FetchResults(): 
> query_id=764d4313dbc64e20:2831560c fetch_size=1024I1007 
> 22:10:10.697988 56527 scheduler.cc:468] 6d4cba4d2e8ccc42:66ce26a8] 
> Exec at coord is falseI1007 22:10:10.698014 54090 impala-hs2-server.cc:663] 
> GetOperationStatus(): query_id=0d43fd73ce4403fd:da25dde9I1007 
> 22:10:10.698173   127 control-service.cc:142] 
> 0646e91fd6a0a953:02949ff3] ExecQueryFInstances(): 
> query_id=0646e91fd6a0a953:02949ff3 coord=b04a12d76e27:22000 
> #instances=1I1007 22:10:10.698356 56527 admission-controller.cc:1270] 
> 6d4cba4d2e8ccc42:66ce26a8] Trying to admit 
> id=6d4cba4d2e8ccc42:66ce26a8 in pool_name=root.default 
> executor_group_name=default per_host_mem_estimate=52.02 MB 
> dedicated_coord_mem_estimate=110.02 MB max_requests=-1 (configured 
> statically) max_queued=200 (configured statically) max_mem=29.30 GB 
> (configured statically)I1007 22:10:10.698386 56527 
> admission-controller.cc:1282] 6d4cba4d2e8ccc42:66ce26a8] Stats: 
> agg_num_running=9, agg_num_queued=0, agg_mem_reserved=8.34 GB,  
> local_host(local_mem_admitted=9.09 GB, num_admitted_running=9, num_queued=0, 
> backend_mem_reserved=6.70 GB)I1007 22:10:10.698415 56527 
> admission-controller.cc:871] 6d4cba4d2e8ccc42:66ce26a8] Admitting 
> query id=6d4cba4d2e8ccc42:66ce26a8I1007 22:10:10.698479 56527 
> impala-server.cc:1713] 6d4cba4d2e8ccc42:66ce26a8] Registering query 
> locationsI1007 22:10:10.698529 56527 coordinator.cc:97] 
> 6d4cba4d2e8ccc42:66ce26a8] Exec() 
> query_id=6d4cba4d2e8ccc42:66ce26a8 stmt=select count(*) from alltypes 
> where month=1I1007 22:10:10.698992 56527 coordinator.cc:361] 
> 6d4cba4d2e8ccc42:66ce26a8] starting execution on 3 backends for 
> query_id=6d4cba4d2e8ccc42:66ce26a8I1007 22:10:10.699383 56523 
> coordinator.cc:375] 0646e91fd6a0a953:02949ff3] started execution on 1 
> backends for query_id=0646e91fd6a0a953:02949ff3I1007 22:10:10.699409 
> 56534 scheduler.cc:468] e1495f928c2cd4f6:eeda82aa] Exec at coord is 
> falseI1007 22:10:10.700017   127 control-service.cc:142] 
> 6d4cba4d2e8ccc42:66ce26a8] ExecQueryFInstances(): 
> query_id=6d4cba4d2e8ccc42:66ce26a8 coord=b04a12d76e27:22000 
> #instances=1I1007 22:10:10.700147 56534 scheduler.cc:468] 
> e1495f928c2cd4f6:eeda82aa] Exec at coord is falseI1007 
> 22:10:10.700234   325 TAcceptQueueServer.cpp:340] New connection to server 
> hiveserver2-http-frontend from client I1007 
> 22:10:10.700286   329 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
> hiveserver2-http-frontend started connection setup for client  172.18.0.1 Port: 51580>I1007 22:10:10.700314   329 
> TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend 
> finished connection setup for client I1007 
> 22:10:10.700371 56550 impala-hs2-server.cc:844] FetchResults(): 
> query_id=764d4313dbc64e20:2831560c #results=1 has_more=trueI1007 
> 22:10:10.700508 56551 impala-server.cc:1969] Connection 
> 8249c7defcb10124:1bc65ed9ea562aab from client 172.18.0.1:51576 to server 
> hiveserver2-http-frontend closed. The

[jira] [Resolved] (IMPALA-8959) test_union failed with wrong results on S3

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8959.
---
Resolution: Duplicate

I'm pretty sure that this the same cause - I checked and the query was run via 
HS2, so we're exposed to the same Impyla bug.

> test_union failed with wrong results on S3
> --
>
> Key: IMPALA-8959
> URL: https://issues.apache.org/jira/browse/IMPALA-8959
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Error details
> {noformat}
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:611: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:448: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 
> 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None E 
> 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None E 
> 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None E 
> 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 
> 00:00:00,2009,2 != None E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 
> 00:00:00,2009,2 != None E 
> 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None E 
> 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None E 4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 
> 00:00:00,2009,3 != None E 
> 5,false,1,1,1,10,1.10023841858,10.1,'03/01/09','1',2009-03-01 
> 00:01:00,2009,3 != None E Number of rows returned (expected vs actual): 
> 10 != 0{noformat}
> Stack trace
> {noformat}
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:611: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:448: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None
> E 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None
> E 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None
> E 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None
> E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2 != None
> E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2 != None
> E 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None
> E 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None
> E 4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3 != None
> E 5,false,1,1,1,10,1.10023841858,10.1,'03/01/09','1',2009-03-01 
> 00:01:00,2009,3 != None
> E Number of rows returned (expected vs actual): 10 != 0{noformat}
> {noformat}
> select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=1
> union all
>   (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=1
>union all
>  (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=2
>   union all
> (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=2
>  union all
>  select id, bool_col, tinyint_col,

[jira] [Resolved] (IMPALA-8959) test_union failed with wrong results on S3

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8959.
---
Resolution: Duplicate

I'm pretty sure that this the same cause - I checked and the query was run via 
HS2, so we're exposed to the same Impyla bug.

> test_union failed with wrong results on S3
> --
>
> Key: IMPALA-8959
> URL: https://issues.apache.org/jira/browse/IMPALA-8959
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Error details
> {noformat}
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:611: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:448: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 
> 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None E 
> 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None E 
> 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None E 
> 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 
> 00:00:00,2009,2 != None E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 
> 00:00:00,2009,2 != None E 
> 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None E 
> 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None E 4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 
> 00:00:00,2009,3 != None E 
> 5,false,1,1,1,10,1.10023841858,10.1,'03/01/09','1',2009-03-01 
> 00:01:00,2009,3 != None E Number of rows returned (expected vs actual): 
> 10 != 0{noformat}
> Stack trace
> {noformat}
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:611: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:448: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None
> E 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 != None
> E 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None
> E 1,false,1,1,1,10,1.10023841858,10.1,'01/01/09','1',2009-01-01 
> 00:01:00,2009,1 != None
> E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2 != None
> E 2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2 != None
> E 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None
> E 3,false,1,1,1,10,1.10023841858,10.1,'02/01/09','1',2009-02-01 
> 00:01:00,2009,2 != None
> E 4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3 != None
> E 5,false,1,1,1,10,1.10023841858,10.1,'03/01/09','1',2009-03-01 
> 00:01:00,2009,3 != None
> E Number of rows returned (expected vs actual): 10 != 0{noformat}
> {noformat}
> select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=1
> union all
>   (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=1
>union all
>  (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=2
>   union all
> (select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, 
> float_col, double_col, date_string_col, string_col, timestamp_col, year, 
> month from alltypestiny where year=2009 and month=2
>  union all
>  select id, bool_col, tinyint_col,

[jira] [Updated] (IMPALA-9098) TestQueries.test_union failed

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9098:
--
Priority: Blocker  (was: Critical)

> TestQueries.test_union failed
> -
>
> Key: IMPALA-9098
> URL: https://issues.apache.org/jira/browse/IMPALA-9098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Andrew Sherman
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: profile_correct.txt, profile_incorrect.txt
>
>
> This happened once in an ASAN build. This *might* be a flaky test, like 
> IMPALA-8959 or it just might be a regression caused by IMPALA-8999
> {code}
> Error Message
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:650: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:487: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 0,0 != 1,10 E 0,0 != 2,0 E   
>   0,0 != 224,80 E 1,10 != 252,90 E 1,10 != 3,10 E 1000,2000 != 
> 4,0 E 112,40 != 4,0 E 140,50 != 5,10 E 168,60 != 6,0 E 196,70 
> != 7,10 E 2,0 != None E 2,0 != None E 224,80 != None E 252,90 
> != None E 28,10 != None E 3,10 != None E 3,10 != None E 4,0 
> != None E 4,0 != None E 5,10 != None E 5,10 != None E 56,20 
> != None E 6,0 != None E 6,0 != None E 7,10 != None E 7,10 != 
> None E 84,30 != None E Number of rows returned (expected vs actual): 
> 27 != 10
> Stacktrace
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:650: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:487: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,0 != 1,10
> E 0,0 != 2,0
> E 0,0 != 224,80
> E 1,10 != 252,90
> E 1,10 != 3,10
> E 1000,2000 != 4,0
> E 112,40 != 4,0
> E 140,50 != 5,10
> E 168,60 != 6,0
> E 196,70 != 7,10
> E 2,0 != None
> E 2,0 != None
> E 224,80 != None
> E 252,90 != None
> E 28,10 != None
> E 3,10 != None
> E 3,10 != None
> E 4,0 != None
> E 4,0 != None
> E 5,10 != None
> E 5,10 != None
> E 56,20 != None
> E 6,0 != None
> E 6,0 != None
> E 7,10 != None
> E 7,10 != None
> E 84,30 != None
> E Number of rows returned (expected vs actual): 27 != 10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9098) TestQueries.test_union failed

2019-10-31 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964471#comment-16964471
 ] 

Tim Armstrong commented on IMPALA-9098:
---

I found a good and bad instance of the same query. It looks like it produced at 
least 16 rows, but only 10 rows were returned by the client. Suspiciously, on 
this run BATCH_SIZE=10.

I have a theory that this is actually an Impyla bug triggered by us returning 0 
rows in some cases: https://github.com/cloudera/impyla/issues/369

> TestQueries.test_union failed
> -
>
> Key: IMPALA-9098
> URL: https://issues.apache.org/jira/browse/IMPALA-9098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Andrew Sherman
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: profile_correct.txt, profile_incorrect.txt
>
>
> This happened once in an ASAN build. This *might* be a flaky test, like 
> IMPALA-8959 or it just might be a regression caused by IMPALA-8999
> {code}
> Error Message
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:650: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:487: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 0,0 != 1,10 E 0,0 != 2,0 E   
>   0,0 != 224,80 E 1,10 != 252,90 E 1,10 != 3,10 E 1000,2000 != 
> 4,0 E 112,40 != 4,0 E 140,50 != 5,10 E 168,60 != 6,0 E 196,70 
> != 7,10 E 2,0 != None E 2,0 != None E 224,80 != None E 252,90 
> != None E 28,10 != None E 3,10 != None E 3,10 != None E 4,0 
> != None E 4,0 != None E 5,10 != None E 5,10 != None E 56,20 
> != None E 6,0 != None E 6,0 != None E 7,10 != None E 7,10 != 
> None E 84,30 != None E Number of rows returned (expected vs actual): 
> 27 != 10
> Stacktrace
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:650: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:487: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,0 != 1,10
> E 0,0 != 2,0
> E 0,0 != 224,80
> E 1,10 != 252,90
> E 1,10 != 3,10
> E 1000,2000 != 4,0
> E 112,40 != 4,0
> E 140,50 != 5,10
> E 168,60 != 6,0
> E 196,70 != 7,10
> E 2,0 != None
> E 2,0 != None
> E 224,80 != None
> E 252,90 != None
> E 28,10 != None
> E 3,10 != None
> E 3,10 != None
> E 4,0 != None
> E 4,0 != None
> E 5,10 != None
> E 5,10 != None
> E 56,20 != None
> E 6,0 != None
> E 6,0 != None
> E 7,10 != None
> E 7,10 != None
> E 84,30 != None
> E Number of rows returned (expected vs actual): 27 != 10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9098) TestQueries.test_union failed

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9098:
--
Attachment: profile_incorrect.txt
profile_correct.txt

> TestQueries.test_union failed
> -
>
> Key: IMPALA-9098
> URL: https://issues.apache.org/jira/browse/IMPALA-9098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Andrew Sherman
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: profile_correct.txt, profile_incorrect.txt
>
>
> This happened once in an ASAN build. This *might* be a flaky test, like 
> IMPALA-8959 or it just might be a regression caused by IMPALA-8999
> {code}
> Error Message
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:650: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:487: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 0,0 != 1,10 E 0,0 != 2,0 E   
>   0,0 != 224,80 E 1,10 != 252,90 E 1,10 != 3,10 E 1000,2000 != 
> 4,0 E 112,40 != 4,0 E 140,50 != 5,10 E 168,60 != 6,0 E 196,70 
> != 7,10 E 2,0 != None E 2,0 != None E 224,80 != None E 252,90 
> != None E 28,10 != None E 3,10 != None E 3,10 != None E 4,0 
> != None E 4,0 != None E 5,10 != None E 5,10 != None E 56,20 
> != None E 6,0 != None E 6,0 != None E 7,10 != None E 7,10 != 
> None E 84,30 != None E Number of rows returned (expected vs actual): 
> 27 != 10
> Stacktrace
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:650: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:487: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,0 != 1,10
> E 0,0 != 2,0
> E 0,0 != 224,80
> E 1,10 != 252,90
> E 1,10 != 3,10
> E 1000,2000 != 4,0
> E 112,40 != 4,0
> E 140,50 != 5,10
> E 168,60 != 6,0
> E 196,70 != 7,10
> E 2,0 != None
> E 2,0 != None
> E 224,80 != None
> E 252,90 != None
> E 28,10 != None
> E 3,10 != None
> E 3,10 != None
> E 4,0 != None
> E 4,0 != None
> E 5,10 != None
> E 5,10 != None
> E 56,20 != None
> E 6,0 != None
> E 6,0 != None
> E 7,10 != None
> E 7,10 != None
> E 84,30 != None
> E Number of rows returned (expected vs actual): 27 != 10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9115) "Exec at coord is" log spam

2019-10-31 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9115:
-

 Summary: "Exec at coord is" log spam
 Key: IMPALA-9115
 URL: https://issues.apache.org/jira/browse/IMPALA-9115
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Tim Armstrong
Assignee: Bikramjeet Vig


I see a lot of this in the logs, maybe we should move it to VLOG(2)?

{noformat}
I1026 04:26:15.066264 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.068248 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.069190 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.070245 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9115) "Exec at coord is" log spam

2019-10-31 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-9115:
-

 Summary: "Exec at coord is" log spam
 Key: IMPALA-9115
 URL: https://issues.apache.org/jira/browse/IMPALA-9115
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Tim Armstrong
Assignee: Bikramjeet Vig


I see a lot of this in the logs, maybe we should move it to VLOG(2)?

{noformat}
I1026 04:26:15.066264 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.068248 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.069190 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
I1026 04:26:15.070245 119815 scheduler.cc:548] 
394ed7e2a194714b:31ab5b2f] Exec at coord is false
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7356) Stress test for memory-based admission control

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7356:
-

Assignee: (was: Tim Armstrong)

> Stress test for memory-based admission control
> --
>
> Key: IMPALA-7356
> URL: https://issues.apache.org/jira/browse/IMPALA-7356
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: admission-control
>
> We should extend the existing stress test to have a new mode designed to test 
> memory-based admission control, where the stress test framework does not try 
> to throttle memory consumption but instead relies on Impala doing so. 
> The required changes would be:
> * A mode to disable throttling
> * Options for stricter pass conditions - queries should not fail with OOM 
> even if the stress test tries to submit way too many queries. 
> * However AC queue timeouts may be ok.
> * Investigation into the logic for choosing which query to run next and when 
> - does that need to change?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9114) TestKuduHMSIntegration failing: Kudu create table failing

2019-10-31 Thread Hao Hao (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964454#comment-16964454
 ] 

Hao Hao commented on IMPALA-9114:
-

Looking at the Kudu log, it should be a test issue that somehow 
TestKuduHMSIntegration ran even when the Kudu service was not running (no Kudu 
log around 2019-10-31 12:46). I cannot find the log to show why this happened 
as the test should only start when Kudu service has been 
[restarted|https://github.com/apache/impala/blob/master/tests/common/custom_cluster_test_suite.py#L215].
 Anyway, lower the priority as it should be a test issue.

> TestKuduHMSIntegration failing: Kudu create table failing
> -
>
> Key: IMPALA-9114
> URL: https://issues.apache.org/jira/browse/IMPALA-9114
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Bikramjeet Vig
>Assignee: Hao Hao
>Priority: Critical
>  Labels: broken-build
>
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Cannot 
> analyze Kudu table 't': Error determining if Kudu's integration with the Hive 
> Metastore is enabled: cannot complete before timeout: 
> KuduRpc(method=getHiveMetastoreConfig, tablet=null, attempt=97, 
> TimeoutTracker(timeout=18, elapsed=178723), Trace Summary(177842 ms): 
> Sent(0), Received(0), Delayed(96), MasterRefresh(0), AuthRefresh(0), 
> Truncated: false  Delayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
> Stacktrace
> custom_cluster/test_kudu.py:150: in test_create_managed_kudu_tables
> self.run_test_case('QueryTest/kudu_create', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:621: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:556: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:893: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Cannot analyze Kudu table 't': Error 
> determining if Kudu's integration with the Hive Metastore is enabled: cannot 
> complete before timeout: KuduRpc(method=getHiveMetastoreConfig, tablet=null, 
> attempt=97, TimeoutTracker(timeout=18, elapsed=178723), Trace 
> Summary(177842 ms): Sent(0), Received(0), Delayed(96), MasterRefresh(0), 
> AuthRefresh(0), Truncated: false
> EDelayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
> Standard Output
> Stopping kudu
> Starting kudu (Web UI - http://localhost:8051)
> Standard Error
> -- 2019-10-31 12:46:18,353 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=-kudu_client_rpc_timeout_ms=3 ' 
> '--state_store_args=None ' --impalad_args=--default_query_options=
> 12:46:18 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 12:46:18 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 12:46:18 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
>

[jira] [Updated] (IMPALA-9114) TestKuduHMSIntegration failing: Kudu create table failing

2019-10-31 Thread Hao Hao (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao updated IMPALA-9114:

Priority: Minor  (was: Critical)

> TestKuduHMSIntegration failing: Kudu create table failing
> -
>
> Key: IMPALA-9114
> URL: https://issues.apache.org/jira/browse/IMPALA-9114
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Bikramjeet Vig
>Assignee: Hao Hao
>Priority: Minor
>  Labels: broken-build
>
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Cannot 
> analyze Kudu table 't': Error determining if Kudu's integration with the Hive 
> Metastore is enabled: cannot complete before timeout: 
> KuduRpc(method=getHiveMetastoreConfig, tablet=null, attempt=97, 
> TimeoutTracker(timeout=18, elapsed=178723), Trace Summary(177842 ms): 
> Sent(0), Received(0), Delayed(96), MasterRefresh(0), AuthRefresh(0), 
> Truncated: false  Delayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
> Stacktrace
> custom_cluster/test_kudu.py:150: in test_create_managed_kudu_tables
> self.run_test_case('QueryTest/kudu_create', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:621: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:556: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:893: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Cannot analyze Kudu table 't': Error 
> determining if Kudu's integration with the Hive Metastore is enabled: cannot 
> complete before timeout: KuduRpc(method=getHiveMetastoreConfig, tablet=null, 
> attempt=97, TimeoutTracker(timeout=18, elapsed=178723), Trace 
> Summary(177842 ms): Sent(0), Received(0), Delayed(96), MasterRefresh(0), 
> AuthRefresh(0), Truncated: false
> EDelayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
> Standard Output
> Stopping kudu
> Starting kudu (Web UI - http://localhost:8051)
> Standard Error
> -- 2019-10-31 12:46:18,353 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=-kudu_client_rpc_timeout_ms=3 ' 
> '--state_store_args=None ' --impalad_args=--default_query_options=
> 12:46:18 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 12:46:18 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 12:46:18 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 12:46:18 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 12:46:21 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000
> 12:46:22 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 12:46:23 MainThread: Found 3 impalad/1

[jira] [Closed] (IMPALA-8768) Clarifying the conditions in which audit logs record a query

2019-10-31 Thread Alexandra Rodoni (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandra Rodoni closed IMPALA-8768.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Clarifying the conditions in which audit logs record a query
> 
>
> Key: IMPALA-8768
> URL: https://issues.apache.org/jira/browse/IMPALA-8768
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.13.0, Impala 3.3.0
>Reporter: Vincent Tran
>Assignee: Alexandra Rodoni
>Priority: Minor
> Fix For: Impala 3.4.0
>
>
> Currently, Impala documentation highlights the following cases as operations 
> which the audit logs will record:
> {noformat}
> Which Operations Are Audited
> The kinds of SQL queries represented in the audit log are:
> Queries that are prevented due to lack of authorization.
> Queries that Impala can analyze and parse to determine that they are 
> authorized. The audit data is recorded immediately after Impala finishes its 
> analysis, before the query is actually executed.
> The audit log does not contain entries for queries that could not be parsed 
> and analyzed. For example, a query that fails due to a syntax error is not 
> recorded in the audit log. The audit log also does not contain queries that 
> fail due to a reference to a table that does not exist, if you would be 
> authorized to access the table if it did exist.
> Certain statements in the impala-shell interpreter, such as CONNECT, SUMMARY, 
> PROFILE, SET, and QUIT, do not correspond to actual SQL queries, and these 
> statements are not reflected in the audit log.
> {noformat}
> However, based on[1], there is an unmentioned condition that the client must 
> have issued at least one fetch for analyzed queries to be recorded in audit 
> logs.
> [1] 
> https://github.com/apache/impala/blob/b3b00da1a1c7b98e84debe11c10258c4a0dff944/be/src/service/impala-server.cc#L690-L734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (IMPALA-8768) Clarifying the conditions in which audit logs record a query

2019-10-31 Thread Alexandra Rodoni (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandra Rodoni closed IMPALA-8768.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Clarifying the conditions in which audit logs record a query
> 
>
> Key: IMPALA-8768
> URL: https://issues.apache.org/jira/browse/IMPALA-8768
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.13.0, Impala 3.3.0
>Reporter: Vincent Tran
>Assignee: Alexandra Rodoni
>Priority: Minor
> Fix For: Impala 3.4.0
>
>
> Currently, Impala documentation highlights the following cases as operations 
> which the audit logs will record:
> {noformat}
> Which Operations Are Audited
> The kinds of SQL queries represented in the audit log are:
> Queries that are prevented due to lack of authorization.
> Queries that Impala can analyze and parse to determine that they are 
> authorized. The audit data is recorded immediately after Impala finishes its 
> analysis, before the query is actually executed.
> The audit log does not contain entries for queries that could not be parsed 
> and analyzed. For example, a query that fails due to a syntax error is not 
> recorded in the audit log. The audit log also does not contain queries that 
> fail due to a reference to a table that does not exist, if you would be 
> authorized to access the table if it did exist.
> Certain statements in the impala-shell interpreter, such as CONNECT, SUMMARY, 
> PROFILE, SET, and QUIT, do not correspond to actual SQL queries, and these 
> statements are not reflected in the audit log.
> {noformat}
> However, based on[1], there is an unmentioned condition that the client must 
> have issued at least one fetch for analyzed queries to be recorded in audit 
> logs.
> [1] 
> https://github.com/apache/impala/blob/b3b00da1a1c7b98e84debe11c10258c4a0dff944/be/src/service/impala-server.cc#L690-L734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8768) Clarifying the conditions in which audit logs record a query

2019-10-31 Thread Alexandra Rodoni (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964435#comment-16964435
 ] 

Alexandra Rodoni commented on IMPALA-8768:
--

https://gerrit.cloudera.org/#/c/14575/

> Clarifying the conditions in which audit logs record a query
> 
>
> Key: IMPALA-8768
> URL: https://issues.apache.org/jira/browse/IMPALA-8768
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.13.0, Impala 3.3.0
>Reporter: Vincent Tran
>Assignee: Alexandra Rodoni
>Priority: Minor
>
> Currently, Impala documentation highlights the following cases as operations 
> which the audit logs will record:
> {noformat}
> Which Operations Are Audited
> The kinds of SQL queries represented in the audit log are:
> Queries that are prevented due to lack of authorization.
> Queries that Impala can analyze and parse to determine that they are 
> authorized. The audit data is recorded immediately after Impala finishes its 
> analysis, before the query is actually executed.
> The audit log does not contain entries for queries that could not be parsed 
> and analyzed. For example, a query that fails due to a syntax error is not 
> recorded in the audit log. The audit log also does not contain queries that 
> fail due to a reference to a table that does not exist, if you would be 
> authorized to access the table if it did exist.
> Certain statements in the impala-shell interpreter, such as CONNECT, SUMMARY, 
> PROFILE, SET, and QUIT, do not correspond to actual SQL queries, and these 
> statements are not reflected in the audit log.
> {noformat}
> However, based on[1], there is an unmentioned condition that the client must 
> have issued at least one fetch for analyzed queries to be recorded in audit 
> logs.
> [1] 
> https://github.com/apache/impala/blob/b3b00da1a1c7b98e84debe11c10258c4a0dff944/be/src/service/impala-server.cc#L690-L734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9085) Impala Doc: Refactor impala_s3.html

2019-10-31 Thread Alexandra Rodoni (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9085 started by Alexandra Rodoni.

> Impala Doc: Refactor impala_s3.html
> ---
>
> Key: IMPALA-9085
> URL: https://issues.apache.org/jira/browse/IMPALA-9085
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alexandra Rodoni
>Assignee: Alexandra Rodoni
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9073) Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency

2019-10-31 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964428#comment-16964428
 ] 

Tim Armstrong commented on IMPALA-9073:
---

I think this is because of IMPALA-8803 - releasing resources per-backend. It is 
possible to have > 3 concurrent queries even with only 3 slots per backend if 
the previous queries had released resources on those backends. I think the test 
should actually be checking the number of queries on each backend.

{noformat}
I1024 22:24:48.146070 106355 admission-controller.cc:1303] Stats: 
agg_num_running=3, agg_num_queued=2, agg_mem_reserved=815.26 KB,  
local_host(local_mem_admitted=300.05 MB, num_admitted_running=3, num_queued=2, 
backend_mem_reserved=75.87 KB)
I1024 22:24:48.146086 106355 admission-controller.cc:1509] Could not dequeue 
query id=3e4bce299fd907b1:f92ff5cf reason: Not enough admission control 
slots available on host ip-172-31-20-105:22000. Needed 1 slots but 3/3 are 
already in use.
I1024 22:24:48.179689 107036 coordinator.cc:508] ExecState: query 
id=f9492c4fae791967:35b096ec execution completed
I1024 22:24:48.179709 107036 coordinator.cc:644] Coordinator waiting for 
backends to finish, 1 remaining. query_id=f9492c4fae791967:35b096ec
I1024 22:24:48.179702 107084 krpc-data-stream-mgr.cc:298] 
f9492c4fae791967:35b096ec] DeregisterRecvr(): 
fragment_instance_id=f9492c4fae791967:35b096ec, node=1
I1024 22:24:48.179896 107084 query-state.cc:652] 
f9492c4fae791967:35b096ec] Instance completed. 
instance_id=f9492c4fae791967:35b096ec #in-flight=2 status=OK
I1024 22:24:48.179916 107081 query-state.cc:287] 
f9492c4fae791967:35b096ec] UpdateBackendExecState(): last report for 
f9492c4fae791967:35b096ec
I1024 22:24:48.179930 107088 krpc-data-stream-mgr.cc:298] 
e147d5691862e504:f48934e7] DeregisterRecvr(): 
fragment_instance_id=e147d5691862e504:f48934e7, node=1
I1024 22:24:48.179937 107038 coordinator.cc:508] ExecState: query 
id=e147d5691862e504:f48934e7 execution completed
I1024 22:24:48.179980 107038 coordinator.cc:644] Coordinator waiting for 
backends to finish, 1 remaining. query_id=e147d5691862e504:f48934e7
I1024 22:24:48.180224 107088 query-state.cc:652] 
e147d5691862e504:f48934e7] Instance completed. 
instance_id=e147d5691862e504:f48934e7 #in-flight=1 status=OK
I1024 22:24:48.180279 107083 query-state.cc:287] 
e147d5691862e504:f48934e7] UpdateBackendExecState(): last report for 
e147d5691862e504:f48934e7
I1024 22:24:48.180637 107035 coordinator.cc:508] ExecState: query 
id=1c48aa10421e20d8:677b55c4 execution completed
I1024 22:24:48.180656 107089 krpc-data-stream-mgr.cc:298] 
1c48aa10421e20d8:677b55c4] DeregisterRecvr(): 
fragment_instance_id=1c48aa10421e20d8:677b55c4, node=1
I1024 22:24:48.180670 107035 coordinator.cc:644] Coordinator waiting for 
backends to finish, 1 remaining. query_id=1c48aa10421e20d8:677b55c4
I1024 22:24:48.180932 107089 query-state.cc:652] 
1c48aa10421e20d8:677b55c4] Instance completed. 
instance_id=1c48aa10421e20d8:677b55c4 #in-flight=0 status=OK
I1024 22:24:48.180950 107082 query-state.cc:287] 
1c48aa10421e20d8:677b55c4] UpdateBackendExecState(): last report for 
1c48aa10421e20d8:677b55c4
I1024 22:24:48.181324 106187 coordinator.cc:768] Backend completed: 
host=ip-172-31-20-105:22000 remaining=1 
query_id=f9492c4fae791967:35b096ec
I1024 22:24:48.181390 107036 coordinator.cc:960] Release admission control 
resources for query_id=f9492c4fae791967:35b096ec
I1024 22:24:48.181421 106355 admission-controller.cc:1291] Trying to admit 
id=3e4bce299fd907b1:f92ff5cf in pool_name=default-pool 
executor_group_name=default-pool-group1 per_host_mem_estimate=176.02 MB 
dedicated_coord_mem_estimate=100.02 MB max_requests=-1 (configured statically) 
max_queued=200 (configured statically) max_mem=-1.00 B (configured statically)
I1024 22:24:48.181447 106355 admission-controller.cc:1303] Stats: 
agg_num_running=3, agg_num_queued=2, agg_mem_reserved=815.26 KB,  
local_host(local_mem_admitted=200.03 MB, num_admitted_running=3, num_queued=2, 
backend_mem_reserved=75.87 KB)
I1024 22:24:48.181470 106355 admission-controller.cc:1443] Admitting from 
queue: query=3e4bce299fd907b1:f92ff5cf
I1024 22:24:48.181493 107081 query-state.cc:448] 
f9492c4fae791967:35b096ec] Cancelling fragment instances as directed by 
the coordinator. Returned status: Cancelled
I1024 22:24:48.181511 107081 query-state.cc:669] 
f9492c4fae791967:35b096ec] Cancel: 
query_id=f9492c4fae791967:35b096ec
I1024 22:24:48.181520 107081 krpc-data-stream-mgr.cc:329] 
f9492c4fae791967:35b096ec] cancelling active streams for 
fragment_instance_id=f9492c4fae791967:35b096ec
I1024 22:24:48.181553 106355 admission-controller.cc:1291]

[jira] [Commented] (IMPALA-9098) TestQueries.test_union failed

2019-10-31 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964420#comment-16964420
 ] 

Tim Armstrong commented on IMPALA-9098:
---

No luck reproducing this.

> TestQueries.test_union failed
> -
>
> Key: IMPALA-9098
> URL: https://issues.apache.org/jira/browse/IMPALA-9098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Andrew Sherman
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
>
> This happened once in an ASAN build. This *might* be a flaky test, like 
> IMPALA-8959 or it just might be a regression caused by IMPALA-8999
> {code}
> Error Message
> query_test/test_queries.py:77: in test_union 
> self.run_test_case('QueryTest/union', vector) 
> common/impala_test_suite.py:650: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:487: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 0,0 != 1,10 E 0,0 != 2,0 E   
>   0,0 != 224,80 E 1,10 != 252,90 E 1,10 != 3,10 E 1000,2000 != 
> 4,0 E 112,40 != 4,0 E 140,50 != 5,10 E 168,60 != 6,0 E 196,70 
> != 7,10 E 2,0 != None E 2,0 != None E 224,80 != None E 252,90 
> != None E 28,10 != None E 3,10 != None E 3,10 != None E 4,0 
> != None E 4,0 != None E 5,10 != None E 5,10 != None E 56,20 
> != None E 6,0 != None E 6,0 != None E 7,10 != None E 7,10 != 
> None E 84,30 != None E Number of rows returned (expected vs actual): 
> 27 != 10
> Stacktrace
> query_test/test_queries.py:77: in test_union
> self.run_test_case('QueryTest/union', vector)
> common/impala_test_suite.py:650: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:487: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,0 != 1,10
> E 0,0 != 2,0
> E 0,0 != 224,80
> E 1,10 != 252,90
> E 1,10 != 3,10
> E 1000,2000 != 4,0
> E 112,40 != 4,0
> E 140,50 != 5,10
> E 168,60 != 6,0
> E 196,70 != 7,10
> E 2,0 != None
> E 2,0 != None
> E 224,80 != None
> E 252,90 != None
> E 28,10 != None
> E 3,10 != None
> E 3,10 != None
> E 4,0 != None
> E 4,0 != None
> E 5,10 != None
> E 5,10 != None
> E 56,20 != None
> E 6,0 != None
> E 6,0 != None
> E 7,10 != None
> E 7,10 != None
> E 84,30 != None
> E Number of rows returned (expected vs actual): 27 != 10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8557) Impala on ABFS failed with error "IllegalArgumentException: ABFS does not allow files or directories to end with a dot."

2019-10-31 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964418#comment-16964418
 ] 

Sahil Takiar commented on IMPALA-8557:
--

I can think of two solutions: (1) add "txt" to all written text files, or (2) 
if a file_extension is not specified, then remove the final dot at the end of 
the file. If we implement option #1, we should just implement #2 as well to 
make the code more defensive / prevent this from happening in the future if we 
add support for writing additional file types.

> Impala on ABFS failed with error "IllegalArgumentException: ABFS does not 
> allow files or directories to end with a dot."
> 
>
> Key: IMPALA-8557
> URL: https://issues.apache.org/jira/browse/IMPALA-8557
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Eric Lin
>Assignee: Sahil Takiar
>Priority: Major
>
> HDFS introduced below feature to stop users from creating a file that ends 
> with "." on ABFS:
> https://issues.apache.org/jira/browse/HADOOP-15860
> As a result of this change, Impala now writes to ABFS fails with such error.
> I can see that it generates temp file using this format "$0.$1.$2":
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329
> $2 is the file extension and will be empty if it is TEXT file format:
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65
> Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting 
> 6.2 and works in older versions.
> There is no way to override this empty file extension so no workaround is 
> possible, unless user choose another file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8557) Impala on ABFS failed with error "IllegalArgumentException: ABFS does not allow files or directories to end with a dot."

2019-10-31 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-8557:


Assignee: Sahil Takiar

> Impala on ABFS failed with error "IllegalArgumentException: ABFS does not 
> allow files or directories to end with a dot."
> 
>
> Key: IMPALA-8557
> URL: https://issues.apache.org/jira/browse/IMPALA-8557
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Eric Lin
>Assignee: Sahil Takiar
>Priority: Major
>
> HDFS introduced below feature to stop users from creating a file that ends 
> with "." on ABFS:
> https://issues.apache.org/jira/browse/HADOOP-15860
> As a result of this change, Impala now writes to ABFS fails with such error.
> I can see that it generates temp file using this format "$0.$1.$2":
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329
> $2 is the file extension and will be empty if it is TEXT file format:
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65
> Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting 
> 6.2 and works in older versions.
> There is no way to override this empty file extension so no workaround is 
> possible, unless user choose another file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9048) Impala Doc: Document the global INVALIDATE METADATA on fetch-on-demand impalad

2019-10-31 Thread Alexandra Rodoni (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandra Rodoni updated IMPALA-9048:
-
Description: 
The feature code review: https://gerrit.cloudera.org/#/c/14307/


> Impala Doc: Document the global INVALIDATE METADATA on fetch-on-demand impalad
> --
>
> Key: IMPALA-9048
> URL: https://issues.apache.org/jira/browse/IMPALA-9048
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Alexandra Rodoni
>Assignee: Alexandra Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>
> The feature code review: https://gerrit.cloudera.org/#/c/14307/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8755) Implement Z-ordering for Impala

2019-10-31 Thread Alexandra Rodoni (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964414#comment-16964414
 ] 

Alexandra Rodoni commented on IMPALA-8755:
--

[~norbertluksa] [~boroknagyz] Should this be documented now for 3.4? Or should 
we wait until the custom sorting with Z-Order is implemented before documenting 
this?

> Implement Z-ordering for Impala
> ---
>
> Key: IMPALA-8755
> URL: https://issues.apache.org/jira/browse/IMPALA-8755
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Norbert Luksa
>Priority: Major
>
> Implement Z-ordering for Impala: [https://en.wikipedia.org/wiki/Z-order_curve]
> A Z-order curve defines an ordering on multi-dimensional data. Data sorted 
> that way can be efficiently filtered by min/max statistics regarding to the 
> columns participating in the ordering.
> Impala currently only supports lexicographic ordering via the SORT BY clause. 
> This strongly prefers the first column, i.e. given the "SORT BY A, B, C" 
> clause => A will be totally ordered (hence filtering on A will be very 
> efficient), but values belonging to B and C will be scattered throughout the 
> data set (hence filtering on B or C will barely do any good).
> We could add a new clause, e.g. a "ZSORT BY" clause to Impala that writes the 
> data in Z-order.
> "ZSORT BY A, B C" would cluster the rows in a way that filtering on A, B, or 
> C would be equally efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9114) TestKuduHMSIntegration failing: Kudu create table failing

2019-10-31 Thread Bikramjeet Vig (Jira)

Bikramjeet Vig created IMPALA-9114:
--

 Summary: TestKuduHMSIntegration failing: Kudu create table failing
 Key: IMPALA-9114
 URL: https://issues.apache.org/jira/browse/IMPALA-9114
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.4.0
Reporter: Bikramjeet Vig
Assignee: Hao Hao


{noformat}
Error Message
ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:   MESSAGE: AnalysisException: Cannot analyze 
Kudu table 't': Error determining if Kudu's integration with the Hive Metastore 
is enabled: cannot complete before timeout: 
KuduRpc(method=getHiveMetastoreConfig, tablet=null, attempt=97, 
TimeoutTracker(timeout=18, elapsed=178723), Trace Summary(177842 ms): 
Sent(0), Received(0), Delayed(96), MasterRefresh(0), AuthRefresh(0), Truncated: 
false  Delayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
Stacktrace
custom_cluster/test_kudu.py:150: in test_create_managed_kudu_tables
self.run_test_case('QueryTest/kudu_create', vector, use_db=unique_database)
common/impala_test_suite.py:621: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:556: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:893: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:362: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:356: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:519: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: Cannot analyze Kudu table 't': Error 
determining if Kudu's integration with the Hive Metastore is enabled: cannot 
complete before timeout: KuduRpc(method=getHiveMetastoreConfig, tablet=null, 
attempt=97, TimeoutTracker(timeout=18, elapsed=178723), Trace 
Summary(177842 ms): Sent(0), Received(0), Delayed(96), MasterRefresh(0), 
AuthRefresh(0), Truncated: false
EDelayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
Standard Output
Stopping kudu
Starting kudu (Web UI - http://localhost:8051)
Standard Error
-- 2019-10-31 12:46:18,353 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=-kudu_client_rpc_timeout_ms=3 ' 
'--state_store_args=None ' --impalad_args=--default_query_options=
12:46:18 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
12:46:18 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
12:46:18 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:21 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000
12:46:22 MainThread: Waiting for num_known_live_backends=3. Current value: 0
12:46:23 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:23 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000
12:46:23 MainThread: Waiting for num_known_live_backends=3. Current value: 0
12:46:24 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:24 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000

[jira] [Created] (IMPALA-9114) TestKuduHMSIntegration failing: Kudu create table failing

2019-10-31 Thread Bikramjeet Vig (Jira)

Bikramjeet Vig created IMPALA-9114:
--

 Summary: TestKuduHMSIntegration failing: Kudu create table failing
 Key: IMPALA-9114
 URL: https://issues.apache.org/jira/browse/IMPALA-9114
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.4.0
Reporter: Bikramjeet Vig
Assignee: Hao Hao


{noformat}
Error Message
ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:   MESSAGE: AnalysisException: Cannot analyze 
Kudu table 't': Error determining if Kudu's integration with the Hive Metastore 
is enabled: cannot complete before timeout: 
KuduRpc(method=getHiveMetastoreConfig, tablet=null, attempt=97, 
TimeoutTracker(timeout=18, elapsed=178723), Trace Summary(177842 ms): 
Sent(0), Received(0), Delayed(96), MasterRefresh(0), AuthRefresh(0), Truncated: 
false  Delayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
Stacktrace
custom_cluster/test_kudu.py:150: in test_create_managed_kudu_tables
self.run_test_case('QueryTest/kudu_create', vector, use_db=unique_database)
common/impala_test_suite.py:621: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:556: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:893: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:362: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:356: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:519: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: Cannot analyze Kudu table 't': Error 
determining if Kudu's integration with the Hive Metastore is enabled: cannot 
complete before timeout: KuduRpc(method=getHiveMetastoreConfig, tablet=null, 
attempt=97, TimeoutTracker(timeout=18, elapsed=178723), Trace 
Summary(177842 ms): Sent(0), Received(0), Delayed(96), MasterRefresh(0), 
AuthRefresh(0), Truncated: false
EDelayed: (UNKNOWN, [ getHiveMetastoreConfig, 96 ]))
Standard Output
Stopping kudu
Starting kudu (Web UI - http://localhost:8051)
Standard Error
-- 2019-10-31 12:46:18,353 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=-kudu_client_rpc_timeout_ms=3 ' 
'--state_store_args=None ' --impalad_args=--default_query_options=
12:46:18 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
12:46:18 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
12:46:18 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
12:46:18 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:21 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000
12:46:22 MainThread: Waiting for num_known_live_backends=3. Current value: 0
12:46:23 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:23 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000
12:46:23 MainThread: Waiting for num_known_live_backends=3. Current value: 0
12:46:24 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
12:46:24 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0645.vpc.cloudera.com:25000

[jira] [Created] (IMPALA-9113) Queries can hang if an impalad is killed after a query has FINISHED

2019-10-31 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9113:


 Summary: Queries can hang if an impalad is killed after a query 
has FINISHED
 Key: IMPALA-9113
 URL: https://issues.apache.org/jira/browse/IMPALA-9113
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Clients
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There is a race condition in the query coordination code that could cause 
queries to hang indefinitely in an un-cancellable state if an impalad crashes 
after the query has transitioned to the FINISHED state, but before all backends 
have completed.

The issue occurs if:
 * A query produces all results
 * A client issues a fetch request to read all of those results
 * The client fetch request fetches all available rows (e.g. eos is hit)
 * {{Coordinator::GetNext}} then calls 
{{SetNonErrorTerminalState(ExecState::RETURNED_RESULTS)}} which eventually 
calls {{WaitForBackends()}}
 * {{WaitForBackends()}} will block until all backends have completed
 * One of the impalads running the query crashes, and thus never reports 
success for the query fragment it was running
 * The {{WaitForBackends()}} call will then block indefinitely
 * Any attempt to cancel the query fails because the original fetch request 
that drove the {{WaitForBackends()}} call has acquired the 
{{ClientRequestState}} lock, which thus prevents any cancellation from 
occurring.

Implementing IMPALA-6984 should theoretically fix because as soon as eos is 
hit, it would call {{CancelBackends()}} rather than {{WaitForBackends()}}. 
Another solution would be to add a timeout to the {{WaitForBackends()}} so that 
it returns after the timeout is hit, this would force the fetch request to 
return 0 rows with {{hasMoreRows=true}}, and unblock any cancellation threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9113) Queries can hang if an impalad is killed after a query has FINISHED

2019-10-31 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9113:


 Summary: Queries can hang if an impalad is killed after a query 
has FINISHED
 Key: IMPALA-9113
 URL: https://issues.apache.org/jira/browse/IMPALA-9113
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Clients
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There is a race condition in the query coordination code that could cause 
queries to hang indefinitely in an un-cancellable state if an impalad crashes 
after the query has transitioned to the FINISHED state, but before all backends 
have completed.

The issue occurs if:
 * A query produces all results
 * A client issues a fetch request to read all of those results
 * The client fetch request fetches all available rows (e.g. eos is hit)
 * {{Coordinator::GetNext}} then calls 
{{SetNonErrorTerminalState(ExecState::RETURNED_RESULTS)}} which eventually 
calls {{WaitForBackends()}}
 * {{WaitForBackends()}} will block until all backends have completed
 * One of the impalads running the query crashes, and thus never reports 
success for the query fragment it was running
 * The {{WaitForBackends()}} call will then block indefinitely
 * Any attempt to cancel the query fails because the original fetch request 
that drove the {{WaitForBackends()}} call has acquired the 
{{ClientRequestState}} lock, which thus prevents any cancellation from 
occurring.

Implementing IMPALA-6984 should theoretically fix because as soon as eos is 
hit, it would call {{CancelBackends()}} rather than {{WaitForBackends()}}. 
Another solution would be to add a timeout to the {{WaitForBackends()}} so that 
it returns after the timeout is hit, this would force the fetch request to 
return 0 rows with {{hasMoreRows=true}}, and unblock any cancellation threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-4400) Aggregate runtime filters locally

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-4400 started by Tim Armstrong.
-
> Aggregate runtime filters locally
> -
>
> Key: IMPALA-4400
> URL: https://issues.apache.org/jira/browse/IMPALA-4400
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Marcel Kinard
>Assignee: Tim Armstrong
>Priority: Major
>
> At the moment, runtime filters are sent from each fragment instance directly 
> to the coordinator for aggregation (ORing) at the coordinator.
> With multi-threaded execution, we will have an order of magnitude more 
> fragment instances per node, at which point the coordinator would become a 
> bottleneck during the aggregation process. To avoid that, we need to 
> aggregate the local instances' runtime filters at each node before sending 
> the filter off to the coordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9065) Fix cancellation of RuntimeFilter::WaitForArrival()

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9065.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Fix cancellation of RuntimeFilter::WaitForArrival()
> ---
>
> Key: IMPALA-9065
> URL: https://issues.apache.org/jira/browse/IMPALA-9065
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Proper cancellation wasn't ever implemented for this code path, so if the 
> wait time is set high, threads can get blocked indefinitely even if the 
> coordinator cancelled the query.
> I don't think it's hard to do the right thing -  signal the filter and wake 
> up the thread when the finstance is cancelled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9065) Fix cancellation of RuntimeFilter::WaitForArrival()

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9065.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Fix cancellation of RuntimeFilter::WaitForArrival()
> ---
>
> Key: IMPALA-9065
> URL: https://issues.apache.org/jira/browse/IMPALA-9065
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Proper cancellation wasn't ever implemented for this code path, so if the 
> wait time is set high, threads can get blocked indefinitely even if the 
> coordinator cancelled the query.
> I don't think it's hard to do the right thing -  signal the filter and wake 
> up the thread when the finstance is cancelled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9108.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-31 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9108.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-31 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964245#comment-16964245
 ] 

ASF subversion and git services commented on IMPALA-9108:
-

Commit 28b1d53f9cb7581974dfc0b2dd75f2f015c1c6b9 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=28b1d53 ]

IMPALA-9108: exclude leveldbjni mvn dependency

We don't need this at all - it's pulled in via
some transitive dependencies, e.g. htrace and
hive-serde.

Add an exclusion and add it as a banned dependency.

Change-Id: I90b63bc03511545530e1506bc602623591c56e98
Reviewed-on: http://gerrit.cloudera.org:8080/14593
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 


> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964206#comment-16964206
 ] 

Csaba Ringhofer edited comment on IMPALA-3933 at 10/31/19 4:54 PM:
---

[~mylogi...@gmail.com] 

I checked dates with close to ~asf-master Impala and CDH/CDP Hives. We work 
differently depending on the Hive version.

{code}
>From Hive:
create table tdate (d date) stored as parquet;
insert into table tdate values ("0001-01-01"), ("1400-01-01"), ("1500-01-01"), 
("1800-01-01");
>From Impala:
invalidate metadata tdate;
select * from tdata;

When the data was inserted with CDP Hive, we return all values correctly:
++
| d  |
++
| 0001-01-01 |
| 1400-01-01 |
| 1500-01-01 |
| 1800-01-01 |
++
With CDH Hive, the very old dates are shifted, probably related to Julian vs 
Proleptic Gregorian interpretation of old dates:
++
| d  |
++
| NULL   |
| 1400-01-09 |
| 1500-01-10 |
| 1800-01-01 |
++
WARNINGS: Parquet file 'hdfs://localhost:20500/test-warehouse/tdate/00_0' 
column 'd' contains an out of range date. The valid date range is 
0001-01-01..-12-31.
{code}

So dates are also problematic with CDH Hive, but it is a different problem than 
the one described in the description of the Jira. The original issue is about 
historical timezone rules, which do not affect dates, but very old dates are 
still affected by different Julian/Gregorian handling. I think Hive switched to 
Proleptic Gregorian in Hive 3.1. so it is similar to Impal from that point.




was (Author: csringhofer):
[~mylogi...@gmail.com] 

I checked dates with clode to ~asf-master Impala and CDH/CDP Hives. We work 
differently depending on the Hive version.

{code}
>From Hive:
create table tdate (d date) stored as parquet;
insert into table tdate values ("0001-01-01"), ("1400-01-01"), ("1500-01-01"), 
("1800-01-01");
>From Impala:
invalidate metadata tdate;
select * from tdata;

When the data was inserted with CDP Hive, we return all values correctly:
++
| d  |
++
| 0001-01-01 |
| 1400-01-01 |
| 1500-01-01 |
| 1800-01-01 |
++
With CDH Hive, the very old dates are shifted, probably related to Julian vs 
Proleptic Gregorian interpretation of old dates:
++
| d  |
++
| NULL   |
| 1400-01-09 |
| 1500-01-10 |
| 1800-01-01 |
++
WARNINGS: Parquet file 'hdfs://localhost:20500/test-warehouse/tdate/00_0' 
column 'd' contains an out of range date. The valid date range is 
0001-01-01..-12-31.
{code}

So dates are also problematic with CDH Hive, but it is a different problem than 
the one described in the description of the Jira. The original issue is about 
historical timezone rules, which do not affect dates, but very old dates are 
still affected by different Julian/Gregorian handling. I think Hive switched to 
Proleptic Gregorian in Hive 3.1. so it is similar to Impal from that point.



> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4

[jira] [Commented] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964206#comment-16964206
 ] 

Csaba Ringhofer commented on IMPALA-3933:
-

[~mylogi...@gmail.com] 

I checked dates with clode to ~asf-master Impala and CDH/CDP Hives. We work 
differently depending on the Hive version.

{code}
>From Hive:
create table tdate (d date) stored as parquet;
insert into table tdate values ("0001-01-01"), ("1400-01-01"), ("1500-01-01"), 
("1800-01-01");
>From Impala:
invalidate metadata tdate;
select * from tdata;

When the data was inserted with CDP Hive, we return all values correctly:
++
| d  |
++
| 0001-01-01 |
| 1400-01-01 |
| 1500-01-01 |
| 1800-01-01 |
++
With CDH Hive, the very old dates are shifted, probably related to Julian vs 
Proleptic Gregorian interpretation of old dates:
++
| d  |
++
| NULL   |
| 1400-01-09 |
| 1500-01-10 |
| 1800-01-01 |
++
WARNINGS: Parquet file 'hdfs://localhost:20500/test-warehouse/tdate/00_0' 
column 'd' contains an out of range date. The valid date range is 
0001-01-01..-12-31.
{code}

So dates are also problematic with CDH Hive, but it is a different problem than 
the one described in the description of the Jira. The original issue is about 
historical timezone rules, which do not affect dates, but very old dates are 
still affected by different Julian/Gregorian handling. I think Hive switched to 
Proleptic Gregorian in Hive 3.1. so it is similar to Impal from that point.



> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 10:34:45 |
> | 6| 1949-04-15 10:34:45 |
> | 7| 1753-04-15 11:34:45 |
> | 8| 1752-04-15 11:34:45 |
> +--+-+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 12:34:45 |
> | 6| 1949-04-15 12:34:45 |
> | 7| 1753-04-15 12:51:05 |
> | 8| 1752-04-15 12:51:05 |
> +--+-+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964196#comment-16964196
 ] 

Tim Armstrong commented on IMPALA-3933:
---

Dates don't have timezones or any kind of conversions, so yes - everything is a 
lot simpler.

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 10:34:45 |
> | 6| 1949-04-15 10:34:45 |
> | 7| 1753-04-15 11:34:45 |
> | 8| 1752-04-15 11:34:45 |
> +--+-+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 12:34:45 |
> | 6| 1949-04-15 12:34:45 |
> | 7| 1753-04-15 12:51:05 |
> | 8| 1752-04-15 12:51:05 |
> +--+-+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Manish Maheshwari (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964176#comment-16964176
 ] 

Manish Maheshwari commented on IMPALA-3933:
---

Question - If we switch to Date instead of TimeStamp, will the issue #1 in the 
above comment get fixed?

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 10:34:45 |
> | 6| 1949-04-15 10:34:45 |
> | 7| 1753-04-15 11:34:45 |
> | 8| 1752-04-15 11:34:45 |
> +--+-+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 12:34:45 |
> | 6| 1949-04-15 12:34:45 |
> | 7| 1753-04-15 12:51:05 |
> | 8| 1752-04-15 12:51:05 |
> +--+-+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9112) Consider removing hdfsExists calls when writing files to S3

2019-10-31 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9112:
-
Summary: Consider removing hdfsExists calls when writing files to S3  (was: 
Consider removing hdfsExists calls when writing out files)

> Consider removing hdfsExists calls when writing files to S3
> ---
>
> Key: IMPALA-9112
> URL: https://issues.apache.org/jira/browse/IMPALA-9112
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are a few places in the backend where we call {{hdfsExists}} before 
> writing out a file. This can cause issues when writing data to S3, because S3 
> can cache 404 Not Found errors. This issue manifests itself with errors such 
> as:
> {code:java}
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
>  TO 
> s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq)
>  failed, error was: 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
> Error(5): Input/output error
> Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 
> 404; Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: 
> []){code}
> HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying 
> an "overwrite" option when creating a file; this can avoid doing any HEAD 
> requests when opening a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9112) Consider removing hdfsExists calls when writing out files

2019-10-31 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9112:


 Summary: Consider removing hdfsExists calls when writing out files
 Key: IMPALA-9112
 URL: https://issues.apache.org/jira/browse/IMPALA-9112
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are a few places in the backend where we call {{hdfsExists}} before 
writing out a file. This can cause issues when writing data to S3, because S3 
can cache 404 Not Found errors. This issue manifests itself with errors such as:
{code:java}
ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME 
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
 TO 
s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq)
 failed, error was: 
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
Error(5): Input/output error
Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; 
Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: []){code}
HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying an 
"overwrite" option when creating a file; this can avoid doing any HEAD requests 
when opening a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9112) Consider removing hdfsExists calls when writing out files

2019-10-31 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-9112:


 Summary: Consider removing hdfsExists calls when writing out files
 Key: IMPALA-9112
 URL: https://issues.apache.org/jira/browse/IMPALA-9112
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are a few places in the backend where we call {{hdfsExists}} before 
writing out a file. This can cause issues when writing data to S3, because S3 
can cache 404 Not Found errors. This issue manifests itself with errors such as:
{code:java}
ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME 
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
 TO 
s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq)
 failed, error was: 
s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
Error(5): Input/output error
Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; 
Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: []){code}
HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying an 
"overwrite" option when creating a file; this can avoid doing any HEAD requests 
when opening a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails

2019-10-31 Thread Daniel Becker (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9111:
--
Description: 
Starting the Impala cluster with
{code:java}
bin/start-impala-cluster.py --impalad_args="-disable_optimization_passes"{code}
 

the following query fails and Impala crashes:

 
{code:java}
SELECT d28_1
 FROM functional.decimal_rtf_tbl ORDER BY d28_1;{code}
 


This error happens if the inlining pass in OptimizeModule in 
be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens 
with decimals that need to be stored on 16 bytes. Maybe it is some ABI 
incompatibility with Decimal16Value.

Stack trace:
{code:java}
#0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x7fda6e64002a in __GI_abort () at abort.c:89
#2 0x7fda71707149 in os::abort(bool) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#3 0x7fda718bad27 in VMError::report_and_die() () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#4 0x7fda71710e4f in JVM_handle_linux_signal () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6 
#7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, 
impala::ColumnType const&) ()
#8 0x7fd9c3438e25 in Compare ()
#9 0x02a26293 in impala::TupleRowComparator::Compare 
(rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at 
be/src/util/tuple-row-compare.h:98
#10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, 
this=0x1284e480) at be/src/util/tuple-row-compare.h:107
#11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, 
rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72
#12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree 
(this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at 
be/src/runtime/sorter-ir.cc:214
#13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206
#14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165
#15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, 
run=0x13974da0) at be/src/runtime/sorter.cc:755
#16 0x02a18e27 in impala::Sorter::SortCurrentInputRun (this=0x1284e3c0) 
at be/src/runtime/sorter.cc:956
#17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at 
be/src/runtime/sorter.cc:892
#18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:187
#19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:90
#20 0x020f289a in impala::FragmentInstanceState::Open (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:348
#21 0x020ef54c in impala::FragmentInstanceState::Exec (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:84
#22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, 
fis=0xe0571e0) at be/src/runtime/query-state.cc:650
#23 0x02101268 in impala::QueryStateoperator()(void) 
const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558
#24 0x02104c7d in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
#25 0x01f04b46 in boost::function0::operator() 
(this=0x7fd9c3c4bca0) at 
toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
#26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (Python Exception  No type named class std::basic_string, std::allocator >::_Rep.: 
name=, Python Exception  No type named class 
std::basic_string, std::allocator >::_Rep.: 
category=, functor=..., parent_thread_info=0x7fd9c4c4d950, 
thread_started=0x7fd9c4c4c8f0) at be/src/util/thread.cc:360
#27 0x02483e81 in boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0xd3857c0, 
f=@0xd3857b8: 0x247b796 , impala::ThreadDebugInfo const*, 
impala::Promise*)>, a=...)
at toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:525
#28 0x02483da5 in boost::_bi::bind_t,

[jira] [Created] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails

2019-10-31 Thread Daniel Becker (Jira)

Daniel Becker created IMPALA-9111:
-

 Summary: Sorting 'Decimal16Value's with codegen enabled but 
codegen optimizations disabled fails
 Key: IMPALA-9111
 URL: https://issues.apache.org/jira/browse/IMPALA-9111
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker


Starting the Impala cluster with

```
bin/start-impala-cluster.py --impalad_args="-disable_optimization_passes"
```

the following query fails and Impala crashes:

```

SELECT d28_1
FROM functional.decimal_rtf_tbl ORDER BY d28_1;

```

This error happens if the inlining pass in OptimizeModule in 
be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens 
with decimals that need to be stored on 16 bytes. Maybe it is some ABI 
incompatibility with Decimal16Value.

Stack trace:
#0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x7fda6e64002a in __GI_abort () at abort.c:89
#2 0x7fda71707149 in os::abort(bool) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#3 0x7fda718bad27 in VMError::report_and_die() () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#4 0x7fda71710e4f in JVM_handle_linux_signal () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6 
#7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, 
impala::ColumnType const&) ()
#8 0x7fd9c3438e25 in Compare ()
#9 0x02a26293 in impala::TupleRowComparator::Compare 
(rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at 
be/src/util/tuple-row-compare.h:98
#10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, 
this=0x1284e480) at be/src/util/tuple-row-compare.h:107
#11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, 
rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72
#12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree 
(this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at 
be/src/runtime/sorter-ir.cc:214
#13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206
#14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165
#15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, 
run=0x13974da0) at be/src/runtime/sorter.cc:755
#16 0x02a18e27 in impala::Sorter::SortCurrentInputRun (this=0x1284e3c0) 
at be/src/runtime/sorter.cc:956
#17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at 
be/src/runtime/sorter.cc:892
#18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:187
#19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:90
#20 0x020f289a in impala::FragmentInstanceState::Open (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:348
#21 0x020ef54c in impala::FragmentInstanceState::Exec (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:84
#22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, 
fis=0xe0571e0) at be/src/runtime/query-state.cc:650
#23 0x02101268 in impala::QueryStateoperator()(void) 
const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558
#24 0x02104c7d in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
 at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
#25 0x01f04b46 in boost::function0::operator() 
(this=0x7fd9c3c4bca0) at 
toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
#26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (Python Exception  No type named class std::basic_string, std::allocator >::_Rep.: 
name=, Python Exception  No type named class 
std::basic_string, std::allocator >::_Rep.: 
category=, functor=..., parent_thread_info=0x7fd9c4c4d950, 
 thread_started=0x7fd9c4c4c8f0) at be/src/util/thread.cc:360
#27 0x02483e81 in boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0xd3857c0, 
 f=@0xd3857b8: 0x247b796 , impala::ThreadDebugInfo

[jira] [Created] (IMPALA-9111) Sorting 'Decimal16Value's with codegen enabled but codegen optimizations disabled fails

2019-10-31 Thread Daniel Becker (Jira)

Daniel Becker created IMPALA-9111:
-

 Summary: Sorting 'Decimal16Value's with codegen enabled but 
codegen optimizations disabled fails
 Key: IMPALA-9111
 URL: https://issues.apache.org/jira/browse/IMPALA-9111
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker


Starting the Impala cluster with

```
bin/start-impala-cluster.py --impalad_args="-disable_optimization_passes"
```

the following query fails and Impala crashes:

```

SELECT d28_1
FROM functional.decimal_rtf_tbl ORDER BY d28_1;

```

This error happens if the inlining pass in OptimizeModule in 
be/src/codegen/llvm-codegen.cc is not run. It seems the problem only happens 
with decimals that need to be stored on 16 bytes. Maybe it is some ABI 
incompatibility with Decimal16Value.

Stack trace:
#0 0x7fda6e63e428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x7fda6e64002a in __GI_abort () at abort.c:89
#2 0x7fda71707149 in os::abort(bool) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#3 0x7fda718bad27 in VMError::report_and_die() () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#4 0x7fda71710e4f in JVM_handle_linux_signal () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#5 0x7fda71703e48 in signalHandler(int, siginfo_t*, void*) () from 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6 
#7 0x7fd9c3437f8b in impala::RawValue::Compare(void const*, void const*, 
impala::ColumnType const&) ()
#8 0x7fd9c3438e25 in Compare ()
#9 0x02a26293 in impala::TupleRowComparator::Compare 
(rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, this=0x1284e480) at 
be/src/util/tuple-row-compare.h:98
#10 impala::TupleRowComparator::Less (rhs=0x7fd9c3c4a8b8, lhs=0x7fd9c3c4a8c0, 
this=0x1284e480) at be/src/util/tuple-row-compare.h:107
#11 impala::Sorter::TupleSorter::Less (this=0x137b2000, lhs=0x7fd9c3c4a8c0, 
rhs=0x7fd9c3c4a8b8) at be/src/runtime/sorter-ir.cc:72
#12 0x02a27409 in impala::Sorter::TupleSorter::MedianOfThree 
(this=0x137b2000, t1=0x14808e50, t2=0x14802d3f, t3=0x14808085) at 
be/src/runtime/sorter-ir.cc:214
#13 0x02a27394 in impala::Sorter::TupleSorter::SelectPivot 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:206
#14 0x02a26cd8 in impala::Sorter::TupleSorter::SortHelper 
(this=0x137b2000, begin=..., end=...) at be/src/runtime/sorter-ir.cc:165
#15 0x02a15e8a in impala::Sorter::TupleSorter::Sort (this=0x137b2000, 
run=0x13974da0) at be/src/runtime/sorter.cc:755
#16 0x02a18e27 in impala::Sorter::SortCurrentInputRun (this=0x1284e3c0) 
at be/src/runtime/sorter.cc:956
#17 0x02a183e7 in impala::Sorter::InputDone (this=0x1284e3c0) at 
be/src/runtime/sorter.cc:892
#18 0x0263bc18 in impala::SortNode::SortInput (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:187
#19 0x0263a8e0 in impala::SortNode::Open (this=0xdf63e40, 
state=0x11e652a0) at be/src/exec/sort-node.cc:90
#20 0x020f289a in impala::FragmentInstanceState::Open (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:348
#21 0x020ef54c in impala::FragmentInstanceState::Exec (this=0xe0571e0) 
at be/src/runtime/fragment-instance-state.cc:84
#22 0x02102f9b in impala::QueryState::ExecFInstance (this=0xd376000, 
fis=0xe0571e0) at be/src/runtime/query-state.cc:650
#23 0x02101268 in impala::QueryStateoperator()(void) 
const (__closure=0x7fd9c3c4bca8) at be/src/runtime/query-state.cc:558
#24 0x02104c7d in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
 at toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
#25 0x01f04b46 in boost::function0::operator() 
(this=0x7fd9c3c4bca0) at 
toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
#26 0x0247bafd in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (Python Exception  No type named class std::basic_string, std::allocator >::_Rep.: 
name=, Python Exception  No type named class 
std::basic_string, std::allocator >::_Rep.: 
category=, functor=..., parent_thread_info=0x7fd9c4c4d950, 
 thread_started=0x7fd9c4c4c8f0) at be/src/util/thread.cc:360
#27 0x02483e81 in boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0xd3857c0, 
 f=@0xd3857b8: 0x247b796 , impala::ThreadDebugInfo

[jira] [Updated] (IMPALA-9013) Column Masking DML support

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9013:

Priority: Critical  (was: Major)

> Column Masking DML support
> --
>
> Key: IMPALA-9013
> URL: https://issues.apache.org/jira/browse/IMPALA-9013
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Review Hive implementation to see if anything special needs to be done for 
> DML. The Hive column masking design doc does not reflect the current code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9011) Support column masking on CTEs, views, and derived column names

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9011:

Priority: Critical  (was: Major)

> Support column masking on CTEs, views, and derived column names
> ---
>
> Key: IMPALA-9011
> URL: https://issues.apache.org/jira/browse/IMPALA-9011
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> CTE/views: dig out underlying column and table names
>  derived column names i.e. select * from (select 1) as foo - Handle 
> appropriately.
> Also negative cases where the query has an invalid reference. i.e.
> WITH foo AS (SELECT c1 FROM t1) SELECT c1 FROM FOO;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9012) Allow access to columns with column masks and update tests

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9012:

Priority: Critical  (was: Major)

> Allow access to columns with column masks and update tests
> --
>
> Key: IMPALA-9012
> URL: https://issues.apache.org/jira/browse/IMPALA-9012
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Remove check in RangerAuthorizationChecker::authorizeTableAccess
> Remove testcase in RangerAuditLogTest.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9010) Support pre-defined mask types from Ranger UI

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9010:

Priority: Critical  (was: Major)

> Support pre-defined mask types from Ranger UI
> -
>
> Key: IMPALA-9010
> URL: https://issues.apache.org/jira/browse/IMPALA-9010
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Review Hive implementation/behavior.
> Redact/Partial/Hash/Nullify/Unmasked/Date
>  These will be implemented as static SQL transforms in Impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9079) Add Auth Interfaces to retrieve column masks and implement for Ranger

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9079:

Priority: Critical  (was: Major)

> Add Auth Interfaces to retrieve column masks and implement for Ranger
> -
>
> Key: IMPALA-9079
> URL: https://issues.apache.org/jira/browse/IMPALA-9079
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Kurt Deschler
>Priority: Critical
>
> Masks definitions can be retrieved from the ranger plugin. Analyzer has 
> access to AuthorizationFactory via Analyzer::getAuthzFactory(). There are 
> currently no interfaces through AuthorizationFactory or AuthorizationChecker 
> to access the column masks from the plugin. These will need to be added and 
> then implemented for the Ranger plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9009) Core support for column mask transformation in select list

2019-10-31 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9009:

Priority: Critical  (was: Major)

> Core support for column mask transformation in select list
> --
>
> Key: IMPALA-9009
> URL: https://issues.apache.org/jira/browse/IMPALA-9009
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Identify masked columns from SELECT list.
> Support custom (user supplied) mask SQL from Ranger.
> Parse column mask expressions and substitute into original statement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9089) Failed to link impalad on SUSE12

2019-10-31 Thread Donghui Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donghui Xu updated IMPALA-9089:
---
Description: 
Failed to link impalad on SUSE12, as follows：
[100%] Linking CXX executable ../../build/release/service/impalad
/toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
fileread.cc:336
collect2: error: ld returned 1 exit status

CMakeError.log content is as following:
/toolchain/cmake-3.14.3/bin/cmake -E cmake_link_script 
CMakeFiles/cmTC_31214.dir/link.txt --verbose=1
/toolchain/gcc-4.9.2/bin/gcc  -DCHECK_FUNCTION_EXISTS=pthread_create
-rdynamic CMakeFiles/cmTC_31214.dir/CheckFunctionExists.c.o  -o cmTC_31214 
-lpthreads 
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status

  was:
Failed to link impalad on SUSE12, as follows：
[100%] Linking CXX executable ../../build/release/service/impalad
/toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
fileread.cc:336
collect2: error: ld returned 1 exit status

CMakeError.log context is as following:
/toolchain/cmake-3.14.3/bin/cmake -E cmake_link_script 
CMakeFiles/cmTC_31214.dir/link.txt --verbose=1
/toolchain/gcc-4.9.2/bin/gcc  -DCHECK_FUNCTION_EXISTS=pthread_create
-rdynamic CMakeFiles/cmTC_31214.dir/CheckFunctionExists.c.o  -o cmTC_31214 
-lpthreads 
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status


> Failed to link impalad on SUSE12
> 
>
> Key: IMPALA-9089
> URL: https://issues.apache.org/jira/browse/IMPALA-9089
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Donghui Xu
>Priority: Minor
>
> Failed to link impalad on SUSE12, as follows：
> [100%] Linking CXX executable ../../build/release/service/impalad
> /toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
> fileread.cc:336
> collect2: error: ld returned 1 exit status
> CMakeError.log content is as following:
> /toolchain/cmake-3.14.3/bin/cmake -E cmake_link_script 
> CMakeFiles/cmTC_31214.dir/link.txt --verbose=1
> /toolchain/gcc-4.9.2/bin/gcc  -DCHECK_FUNCTION_EXISTS=pthread_create
> -rdynamic CMakeFiles/cmTC_31214.dir/CheckFunctionExists.c.o  -o cmTC_31214 
> -lpthreads 
> /usr/bin/ld: cannot find -lpthreads
> collect2: error: ld returned 1 exit status



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9089) Failed to link impalad on SUSE12

2019-10-31 Thread Donghui Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donghui Xu updated IMPALA-9089:
---
Description: 
Failed to link impalad on SUSE12, as follows：
[100%] Linking CXX executable ../../build/release/service/impalad
/toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
fileread.cc:336
collect2: error: ld returned 1 exit status

CMakeError.log context is as following:
/toolchain/cmake-3.14.3/bin/cmake -E cmake_link_script 
CMakeFiles/cmTC_31214.dir/link.txt --verbose=1
/toolchain/gcc-4.9.2/bin/gcc  -DCHECK_FUNCTION_EXISTS=pthread_create
-rdynamic CMakeFiles/cmTC_31214.dir/CheckFunctionExists.c.o  -o cmTC_31214 
-lpthreads 
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status

  was:
Failed to link impalad on SUSE12, as follows：
[100%] Linking CXX executable ../../build/release/service/impalad
/toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
fileread.cc:336
collect2: error: ld returned 1 exit status



> Failed to link impalad on SUSE12
> 
>
> Key: IMPALA-9089
> URL: https://issues.apache.org/jira/browse/IMPALA-9089
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Donghui Xu
>Priority: Minor
>
> Failed to link impalad on SUSE12, as follows：
> [100%] Linking CXX executable ../../build/release/service/impalad
> /toolchain/binutils-2.26.1/bin/ld.gold: internal error in find_view, at 
> fileread.cc:336
> collect2: error: ld returned 1 exit status
> CMakeError.log context is as following:
> /toolchain/cmake-3.14.3/bin/cmake -E cmake_link_script 
> CMakeFiles/cmTC_31214.dir/link.txt --verbose=1
> /toolchain/gcc-4.9.2/bin/gcc  -DCHECK_FUNCTION_EXISTS=pthread_create
> -rdynamic CMakeFiles/cmTC_31214.dir/CheckFunctionExists.c.o  -o cmTC_31214 
> -lpthreads 
> /usr/bin/ld: cannot find -lpthreads
> collect2: error: ld returned 1 exit status



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-3933) Time zone definitions of Hive/Spark and Impala differ for historical dates

2019-10-31 Thread Manish Maheshwari (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963707#comment-16963707
 ] 

Manish Maheshwari commented on IMPALA-3933:
---

This is the current behaviour on impala-2.11

Issue 1 - 
{code:java}
We will get the below error if a date < 1400/01/01 is inserted. WARNINGS: 
Parquet file 'hdfs://ns1/user/hive/warehouse/abc/00_0_copy_5' column 'ts' 
contains an out of range timestamp. The valid date range is 
1400-01-01..-12-31. {code}
Issue 2 and workaround using Hive UDF's in Impala
{code:java}

1) Set these in Impala - 
-convert_legacy_hive_parquet_utc_timestamps=true
-use_local_tz_for_unix_timestamp_conversions=true

2) In Hive - 

beeline>create table abc(ts timestamp) stored as parquet;
beeline>insert into abc values ('1400-12-12 00:00:00');
beeline>insert into abc values ('1400-9-12 00:00:00');
beeline>insert into abc values ('1500-9-12 00:00:00');
beeline>insert into abc values ('1500-10-12 00:00:00');

beeline>select * from abc;
+abc.ts+
1400-12-12 00:00:00.0
1400-09-12 00:00:00.0
1500-09-12 00:00:00.0
1500-10-12 00:00:00.0
+

impala-shell>invalidate metadata;

impala-shell>select * from abc; 

## This is not the right output
-ts-
1400-09-21 08:00:00
1500-09-22 08:00:00
1400-12-21 08:00:00
1500-10-22 08:00:00
-
Fetched 4 row(s) in 2.96s

## Now using the Hive UDF in Impala, copy hive-exec-1.1.0-cdh5.x.x.jar as 
/tmp/hive-udf.jar

create function hive_unixtime location '/tmp/hive-udf.jar' 
symbol='org.apache.hadoop.hive.ql.udf.UDFFromUnixTime' 

impala-shell> select hive_unixtime(unix_timestamp(ts),'/MM/dd HH:mm') as 
_mm_dd_hh_mm from abc;

--_mm_dd_hh_mm--
1400/09/12 00:00
1500/09/12 00:00
1400/12/12 00:00
1500/10/12 00:00
--
Fetched 4 row(s) in 0.43s
 {code}
 

 

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: impala 2.3
>Reporter: Adriano Simone
>Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 10:34:45 |
> | 6| 1949-04-15 10:34:45 |
> | 7| 1753-04-15 11:34:45 |
> | 8| 1752-04-15 11:34:45 |
> +--+-+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +--+-+
> | col1 | myts|
> +--+-+
> | 1| 2016-04-15 12:34:45 |
> | 2| 1949-04-15 12:34:45 |
> | 3| 1753-04-15 12:34:45 |
> | 4| 1752-04-15 12:34:45 |
> | 5| 2016-04-15 12:34:45 |
> | 6| 1949-04-15 12:34:45 |
> | 7| 1753-04-15 12:51:05 |
> | 8| 1752-04-15 12:51:05 |
> +--+-+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by

53 matches

Mail list logo