[jira] [Created] (IMPALA-10780) Iceberg in Impala should support combination storage

2021-07-08 Thread Yong Yang (Jira)
Yong Yang created IMPALA-10780:
--

 Summary: Iceberg in Impala should support combination storage
 Key: IMPALA-10780
 URL: https://issues.apache.org/jira/browse/IMPALA-10780
 Project: IMPALA
  Issue Type: Improvement
Reporter: Yong Yang


Currently, the filesystem of the metadata path is used to check the data file, 
that is blocking the following scenario:
 # metadata is on hdfs
 # data is on s3a or other object store.

 

Following code in FeIcebergTable.Utils fails this combination:


private static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
Path tableLoc, ListMap hostIndex) throws IOException {
FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);
FileStatus fileStatus = fs.getFileStatus(fileLoc);
return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10780) Iceberg in Impala should support combination storage

2021-07-08 Thread Yong Yang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377332#comment-17377332
 ] 

Yong Yang commented on IMPALA-10780:


I can help change the code to support the scenario. 

> Iceberg in Impala should support combination storage
> 
>
> Key: IMPALA-10780
> URL: https://issues.apache.org/jira/browse/IMPALA-10780
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Yong Yang
>Priority: Major
>
> Currently, the filesystem of the metadata path is used to check the data 
> file, that is blocking the following scenario:
>  # metadata is on hdfs
>  # data is on s3a or other object store.
>  
> Following code in FeIcebergTable.Utils fails this combination:
> private static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
>  Path tableLoc, ListMap hostIndex) throws IOException 
> {
>   FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);     
> FileStatus fileStatus = fs.getFileStatus(fileLoc); 
>   return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10780) Iceberg in Impala should support combination storage

2021-07-08 Thread Yong Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Yang updated IMPALA-10780:
---
Description: 
Currently, the filesystem of the metadata path is used to check the data file, 
that is blocking the following scenario:
 # metadata is on hdfs
 # data is on s3a or other object store.

 

Following code in FeIcebergTable.Utils fails this combination:

private static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
 Path tableLoc, ListMap hostIndex) throws IOException 

{

  FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);     FileStatus 
fileStatus = fs.getFileStatus(fileLoc); 

  return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);

}

 

  was:
Currently, the filesystem of the metadata path is used to check the data file, 
that is blocking the following scenario:
 # metadata is on hdfs
 # data is on s3a or other object store.

 

Following code in FeIcebergTable.Utils fails this combination:


private static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
Path tableLoc, ListMap hostIndex) throws IOException {
FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);
FileStatus fileStatus = fs.getFileStatus(fileLoc);
return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
}

 


> Iceberg in Impala should support combination storage
> 
>
> Key: IMPALA-10780
> URL: https://issues.apache.org/jira/browse/IMPALA-10780
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Yong Yang
>Priority: Major
>
> Currently, the filesystem of the metadata path is used to check the data 
> file, that is blocking the following scenario:
>  # metadata is on hdfs
>  # data is on s3a or other object store.
>  
> Following code in FeIcebergTable.Utils fails this combination:
> private static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
>  Path tableLoc, ListMap hostIndex) throws IOException 
> {
>   FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);     
> FileStatus fileStatus = fs.getFileStatus(fileLoc); 
>   return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

2021-07-08 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor reassigned IMPALA-9873:
-

Assignee: Amogh Margoor

> Skip decoding of non-materialised columns in Parquet
> 
>
> Key: IMPALA-9873
> URL: https://issues.apache.org/jira/browse/IMPALA-9873
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Amogh Margoor
>Priority: Major
>
> This is a first milestone for lazy materialization in parquet, focusing on 
> avoiding decompression and decoding of columns.
> * Identify columns referenced by predicates and runtime row filters and 
> determine what order the columns need to be materialised in. Probably we want 
> to evaluate static predicates before runtime filters to match current 
> behaviour.
> * Rework this loop so that it alternates between materialising columns and 
> evaluating predicates: 
> https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
> * We probably need to keep track of filtered rows using a new data structure, 
> e.g. bitmap
> * We need to then check that bitmap at each step to see if we skip 
> materialising part or all of the following columns. E.g. if the first N rows 
> were pruned, we can skip forward the remaining readers N rows.
> * This part may be a little tricky - there is the risk of adding overhead 
> compared to the current code.
> * It is probably OK to just materialise the partition columns to start off 
> with - avoiding materialising those is not going to buy that much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10781) Avoid nested loop join when there is OR in the join condition

2021-07-08 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-10781:


 Summary: Avoid nested loop join when there is OR in the join 
condition
 Key: IMPALA-10781
 URL: https://issues.apache.org/jira/browse/IMPALA-10781
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Frontend
Reporter: Csaba Ringhofer


The following query becomes a nested loop join in Impala:
{code}
SELECT * FROM t1 JOIN  t2 ON t1_col1 = t2_col1 OR t1_col2 = t2_col2;
{code}

A possible solution is to rewrite the join into an union of two joins where 
each join becomes an equi join. Currently this has to be done by hand.

It is possible to create a more efficient solution that doesn't need to reread 
the right side of the join by adding an operator that duplicates rows and adds 
an extra column that identifies the join condition.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10782) Incorrect handling of failure when starting AllocationFileLoaderService

2021-07-08 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-10782:


 Summary: Incorrect handling of failure when starting 
AllocationFileLoaderService
 Key: IMPALA-10782
 URL: https://issues.apache.org/jira/browse/IMPALA-10782
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Csaba Ringhofer


At 
https://github.com/apache/impala/blob/59d32853ee42886ae683aac95a8be7f9c89b8eb7/fe/src/main/java/org/apache/impala/util/RequestPoolService.java#L243
we call stopInternal if there is an issue with allocLoader_, and then we try to 
stop confWatcher_ too though it hasn't been started yet. This leads to a failed 
precondition at  
https://github.com/apache/impala/blob/59d32853ee42886ae683aac95a8be7f9c89b8eb7/fe/src/main/java/org/apache/impala/util/FileWatchService.java#L136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10729) Impala Doc: Document SHA1 and SHA2 functions introduced in IMPALA-10679

2021-07-08 Thread shajini thayasingh (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377534#comment-17377534
 ] 

shajini thayasingh commented on IMPALA-10729:
-

[~amargoor] I documented this sometime on June 7th. I added a new section to 
describe the changes. You can view it here 
https://gerrit.cloudera.org/#/c/17551/ 

> Impala Doc: Document SHA1 and SHA2 functions introduced in IMPALA-10679 
> 
>
> Key: IMPALA-10729
> URL: https://issues.apache.org/jira/browse/IMPALA-10729
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Assignee: shajini thayasingh
>Priority: Major
>
> IMPALA-10679 introduced SHA1 and SHA2 functions specifically - SHA224, SHA256,
> SHA384 and SHA512 which needs to be documented.
> 1. In FIPS mode SHA1, SHA224 and SHA256 will throw error.
> 2. Sha1(s) will take string as argument. If argument is null, it will return 
> null else Sha1 digest.
> 3. SHA2(s, bit_length) - takes string s and integer bit_length. Supported 
> bit_length are 224, 256, 384 and 512. On passing unsupported bit length error 
> will be thrown. If either argument is null, then null will be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10679) Create SHA2 builtin function

2021-07-08 Thread shajini thayasingh (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377537#comment-17377537
 ] 

shajini thayasingh commented on IMPALA-10679:
-

Doc is taken care!

> Create SHA2 builtin function
> 
>
> Key: IMPALA-10679
> URL: https://issues.apache.org/jira/browse/IMPALA-10679
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: newbie, ramp-up
>
> Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, 
> and SHA-512).
> Hive already supports SHA2: HIVE-10644
> We should add a similar builtin function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10739) Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables

2021-07-08 Thread Attila Jeges (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges reassigned IMPALA-10739:
-

Assignee: Attila Jeges

> Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables
> -
>
> Key: IMPALA-10739
> URL: https://issues.apache.org/jira/browse/IMPALA-10739
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
>
> Impala should support partition evolution for Iceberg tables, i.e. it should 
> be able to set a new partition spec for an Iceberg table via DDL.
> The command should be
> {noformat}
> ALTER TABLE  SET PARTITION SPEC()
> {noformat}
> to be aligned with Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10783) run_and_verify_query_cancellation_test flakiness and improper error handling in TestImpalaShell

2021-07-08 Thread Bikramjeet Vig (Jira)
Bikramjeet Vig created IMPALA-10783:
---

 Summary: run_and_verify_query_cancellation_test flakiness and 
improper error handling in TestImpalaShell
 Key: IMPALA-10783
 URL: https://issues.apache.org/jira/browse/IMPALA-10783
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.0
Reporter: Bikramjeet Vig
Assignee: Bikramjeet Vig


Some tests in TestImpalaShell run impala-shell in a seperate process but don't 
handle the case where the test can fail and the impala-shell process can linger 
on.

One such test run_and_verify_query_cancellation_test, failed due to flakiness 
and since it ran a query that returned a large result, the impala-shell process 
lingered on while fetching results. This caused the query to hold on to 
resources and starve the cluster of memory which caused other tests to fail due 
to not enough memory being available.

The flakiness in run_and_verify_query_cancellation_test was:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:414:
 in test_query_cancellation_during_wait_to_finish
self.run_and_verify_query_cancellation_test(vector, stmt, "RUNNING")
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:422:
 in run_and_verify_query_cancellation_test
wait_for_query_state(vector, stmt, cancel_at_state)
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/util.py:330:
 in wait_for_query_state
raise Exception(exc_text)
E   Exception: The found in flight query is not the one under test: set all
{noformat}
the test checked for running queries too fast while the impala-shell was 
starting up. the impala-shell runs "set all" when it starts which the test 
picked up and raised an error thinking it did find its query.

The result of this lingering query caused other tests to fail and throw errors 
like:

{noformat}
query_test/test_tpcds_queries.py:107: in test_tpcds_q18a
self.run_test_case(self.get_workload() + '-q18a', vector)
common/impala_test_suite.py:678: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:616: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:936: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:189: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:367: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:388: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Failed to get minimum memory reservation of 452.19 MB on 
daemon impala-ec2-centos74-m5-4xlarge-ondemand-191d.vpc.cloudera.com:27002 for 
query 394b7f96d554f99c:6882496c due to following error: Failed to 
increase reservation by 452.19 MB because it would exceed the applicable 
reservation limit for the "Process" ReservationTracker: reservation_limit=10.20 
GB reservation=9.91 GB used_reservation=0 child_reservations=9.91 GB
E   The top 5 queries that allocated memory under this tracker are:
E   Query(fa4ece9474a3f865:1b284e67): Reservation=9.60 GB 
ReservationLimit=9.60 GB OtherMemory=118.01 MB Total=9.71 GB Peak=9.71 GB
E   Query(534d07950247ae68:6f5a410d): Reservation=123.50 MB 
ReservationLimit=9.60 GB OtherMemory=2.68 MB Total=126.18 MB Peak=317.02 MB
E   Query(2e4f087aa8263e23:e697d8e8): Reservation=50.81 MB 
ReservationLimit=9.60 GB OtherMemory=42.62 MB Total=93.43 MB Peak=173.74 MB
E   Query(6e459d892dfa5050:5959219b): Reservation=28.88 MB 
ReservationLimit=9.60 GB OtherMemory=18.77 MB Total=47.64 MB Peak=53.11 MB
E   Query(ad455bea2e0adc64:2b0bbf35): Reservation=17.94 MB 
ReservationLimit=9.60 GB OtherMemory=15.22 MB Total=33.16 MB Peak=163.99 MB
E   
E   
E   
E   
E   
E   Memory is likely oversubscribed. Reducing query concurrency or configuring 
admission control may help avoid this error.
{noformat}

Logs confirmed that fa4ece9474a3f865:1b284e67 is the query id of the 
query that run_and_verify_query_cancellation_test ran.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10784) Add support for retaining cookies among http requests in impala-shell

2021-07-08 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10784:


 Summary: Add support for retaining cookies among http requests in 
impala-shell
 Key: IMPALA-10784
 URL: https://issues.apache.org/jira/browse/IMPALA-10784
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Affects Versions: Impala 4.1
Reporter: Wenzhe Zhou


IMPALA-10234 added support for cookie authentication to impala-shell. But it 
not accept user input cookie name, and it retains only one cookie.

We need to make support for cookie more generic for impala-shell. We should 
allow user to specify cookie names via startup flags, and make impala-shell 
retains more than one cookies.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10784) Add support for retaining cookies among http requests in impala-shell

2021-07-08 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-10784:


Assignee: Wenzhe Zhou

> Add support for retaining cookies among http requests in impala-shell
> -
>
> Key: IMPALA-10784
> URL: https://issues.apache.org/jira/browse/IMPALA-10784
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 4.1
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> IMPALA-10234 added support for cookie authentication to impala-shell. But it 
> not accept user input cookie name, and it retains only one cookie.
> We need to make support for cookie more generic for impala-shell. We should 
> allow user to specify cookie names via startup flags, and make impala-shell 
> retains more than one cookies.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-9696) Evaluate complex expression in scan node to reduce the number of calculations

2021-07-08 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He closed IMPALA-9696.
---
Resolution: Fixed

> Evaluate complex expression in scan node to reduce the number of calculations
> -
>
> Key: IMPALA-9696
> URL: https://issues.apache.org/jira/browse/IMPALA-9696
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org