[jira] [Updated] (IMPALA-9664) Insert events on transactional tables need to call addWriteNotificationLog API

2020-04-23 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9664:

Priority: Critical  (was: Major)

> Insert events on transactional tables need to call addWriteNotificationLog API
> --
>
> Key: IMPALA-9664
> URL: https://issues.apache.org/jira/browse/IMPALA-9664
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Xiaomeng Zhang
>Priority: Critical
>
> According to what we see in Hive source code, for transactional tables, the 
> insert events are fired with a different API {{addWriteNotificationLog}}. 
> Currently Impala fires {{firelistenerEvent}} for both transactional and 
> non-transactional tables. We should look at what is the difference between 
> the two APIs and see if we need to handle transactional tables differently.
> References:
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2402
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2236



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9253) Blacklist additional posix error codes for failed DataStreamService RPCs

2020-04-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9253:
---

Assignee: Csaba Ringhofer

> Blacklist additional posix error codes for failed DataStreamService RPCs
> 
>
> Key: IMPALA-9253
> URL: https://issues.apache.org/jira/browse/IMPALA-9253
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Csaba Ringhofer
>Priority: Major
>
> Filing as a follow up to 
> [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137], 
> [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137] blacklists a node 
> if a RPC fails with specific posix error codes:
>  * 107 = ENOTCONN: Transport endpoint is not connected
>  * 108 = ESHUTDOWN: Cannot send after transport endpoint shutdown
>  * 111 = ECONNREFUSED: Connection refused
> These codes were produced by running a query, killing a node running that 
> query, and then seeing what error codes the query failed with.
> There may be other error codes that are worth using for node blacklisting as 
> well. One way to come up with more error codes is to use iptables to 
> introduce network faults between Impala processes and see how RPCs fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9253) Blacklist additional posix error codes for failed DataStreamService RPCs

2020-04-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9253:
---

Assignee: (was: Csaba Ringhofer)

> Blacklist additional posix error codes for failed DataStreamService RPCs
> 
>
> Key: IMPALA-9253
> URL: https://issues.apache.org/jira/browse/IMPALA-9253
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> Filing as a follow up to 
> [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137], 
> [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137] blacklists a node 
> if a RPC fails with specific posix error codes:
>  * 107 = ENOTCONN: Transport endpoint is not connected
>  * 108 = ESHUTDOWN: Cannot send after transport endpoint shutdown
>  * 111 = ECONNREFUSED: Connection refused
> These codes were produced by running a query, killing a node running that 
> query, and then seeing what error codes the query failed with.
> There may be other error codes that are worth using for node blacklisting as 
> well. One way to come up with more error codes is to use iptables to 
> introduce network faults between Impala processes and see how RPCs fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9625) Impala's COMPUTE STATS statement generates duplicate ALTER events

2020-04-08 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9625 started by Dinesh Garg.
---
> Impala's COMPUTE STATS statement generates duplicate ALTER events
> -
>
> Key: IMPALA-9625
> URL: https://issues.apache.org/jira/browse/IMPALA-9625
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Dinesh Garg
>Priority: Critical
>
> Impala's COMPUTE STATS statement results in the registration of the ALTER 
> event twice. One is in {{Analyzer#registerAuthAndAuditEvent()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L3131-L3133]
>  and the other is in {{Analyzer#getTable()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2862-L2863].
> In {{registerAuthAndAuditEvent()}}, the corresponding full table name 
> {{table.getFullName()}} is produced by a call to 
> {{Analyzer#resolveTableRef()}} 
> ([https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L352]).
>  The resulting database and table names are both in lowercase.
> However, in {{getTable()}}, the fully-qualified table name is produce by a 
> call to {{Analyzer#getFqTableName()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2836].
>  The resulting database and table names are in their originally unconverted 
> form provided by the user from the Impala shell. Hence, there is no guarantee 
> that the database and table names are both in lowercase.
> Therefore, if a user does not provide lowercase database and table names, the 
> returned full table name from {{registerAuthAndAuditEvent()}} and 
> {{getTable()}} would differ, resulting in duplicate ALTER events for the same 
> table.
> We should at least make the full table name consistent every time when we 
> register such an audit event to avoid duplicate entries in the log.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9625) Impala's COMPUTE STATS statement generates duplicate ALTER events

2020-04-08 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9625:
---

Assignee: Fang-Yu Rao  (was: Dinesh Garg)

> Impala's COMPUTE STATS statement generates duplicate ALTER events
> -
>
> Key: IMPALA-9625
> URL: https://issues.apache.org/jira/browse/IMPALA-9625
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> Impala's COMPUTE STATS statement results in the registration of the ALTER 
> event twice. One is in {{Analyzer#registerAuthAndAuditEvent()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L3131-L3133]
>  and the other is in {{Analyzer#getTable()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2862-L2863].
> In {{registerAuthAndAuditEvent()}}, the corresponding full table name 
> {{table.getFullName()}} is produced by a call to 
> {{Analyzer#resolveTableRef()}} 
> ([https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L352]).
>  The resulting database and table names are both in lowercase.
> However, in {{getTable()}}, the fully-qualified table name is produce by a 
> call to {{Analyzer#getFqTableName()}} at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2836].
>  The resulting database and table names are in their originally unconverted 
> form provided by the user from the Impala shell. Hence, there is no guarantee 
> that the database and table names are both in lowercase.
> Therefore, if a user does not provide lowercase database and table names, the 
> returned full table name from {{registerAuthAndAuditEvent()}} and 
> {{getTable()}} would differ, resulting in duplicate ALTER events for the same 
> table.
> We should at least make the full table name consistent every time when we 
> register such an audit event to avoid duplicate entries in the log.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8632) Add support for self-event detection for insert events

2020-04-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8632:
---

Assignee: Xiaomeng Zhang  (was: Dinesh Garg)

> Add support for self-event detection for insert events
> --
>
> Key: IMPALA-8632
> URL: https://issues.apache.org/jira/browse/IMPALA-8632
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Xiaomeng Zhang
>Priority: Critical
>
> In case of {{INSERT_EVENTS}} if Impala inserts into a table it causes a 
> refresh to the underlying table/partition. This could be unnecessary when 
> there is only one Impala cluster in the system. The existing self-event 
> detection framework cannot identify such events because they are not sending 
> HMS objects like tables and partitions to the HMS. Instead in case of 
> {{INSERT_EVENT}} HMS API only asks for a table name or partition value to 
> fire a insert event on it. 
> We can detect a self-event in such cases if the HMS API to fire a listener 
> event is improved to return the event id. This would be used by 
> EventProcessor to ignore the event when it is fetched later in the next 
> polling cycle. In order to support this, we will need to make a change to 
> Hive as well so that the enhanced API can be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8632) Add support for self-event detection for insert events

2020-04-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8632 started by Dinesh Garg.
---
> Add support for self-event detection for insert events
> --
>
> Key: IMPALA-8632
> URL: https://issues.apache.org/jira/browse/IMPALA-8632
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Dinesh Garg
>Priority: Critical
>
> In case of {{INSERT_EVENTS}} if Impala inserts into a table it causes a 
> refresh to the underlying table/partition. This could be unnecessary when 
> there is only one Impala cluster in the system. The existing self-event 
> detection framework cannot identify such events because they are not sending 
> HMS objects like tables and partitions to the HMS. Instead in case of 
> {{INSERT_EVENT}} HMS API only asks for a table name or partition value to 
> fire a insert event on it. 
> We can detect a self-event in such cases if the HMS API to fire a listener 
> event is improved to return the event id. This would be used by 
> EventProcessor to ignore the event when it is fetched later in the next 
> polling cycle. In order to support this, we will need to make a change to 
> Hive as well so that the enhanced API can be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9529) OR predicates not applied correctly on table masking view

2020-03-23 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9529:
---

Assignee: Quanlong Huang

> OR predicates not applied correctly on table masking view
> -
>
> Key: IMPALA-9529
> URL: https://issues.apache.org/jira/browse/IMPALA-9529
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: correctness
>
> Create a column masking policy on functional_parquet.complextypestbl table: 
> id => 100 * id. The following query has incorrect results:
> {code:sql}
> select id, nested_struct.a from functional_parquet.complextypestbl t
> where id = 100 or nested_struct.a = 1;
> +-+-+
> | id  | nested_struct.a |
> +-+-+
> | 100 | 1   |
> | 200 | NULL|
> | 300 | NULL|
> | 400 | NULL|
> | 500 | NULL|
> | 600 | NULL|
> | 700 | 7   |
> | 800 | -1  |
> +-+-+
> {code}
> Explaining the query shows somehow the predicates are not assigned:
> {code}
> Query: explain select id, nested_struct.a from 
> functional_parquet.complextypestbl t
> where id = 100 or nested_struct.a = 1
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=16.00KB Threads=3 
>   |
> | Per-Host Resource Estimates: Memory=32MB
>   |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics.|
> | functional_parquet.complextypestbl  
>   |
> | Analyzed query: SELECT id, nested_struct.a FROM (SELECT CAST(CAST(100 AS 
> BIGINT)  |
> | * id AS BIGINT) id FROM functional_parquet.complextypestbl t) WHERE id =
>   |
> | CAST(100 AS BIGINT) OR nested_struct.a = CAST(1 AS INT) 
>   |
> | 
>   |
> | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | Per-Host Resources: mem-estimate=57.78KB mem-reservation=0B 
> thread-reservation=1  |
> |   PLAN-ROOT SINK
>   |
> |   |  output exprs: CAST(CAST(100 AS BIGINT) * id AS BIGINT), 
> nested_struct.a  |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0
>   |
> |   | 
>   |
> |   01:EXCHANGE [UNPARTITIONED]   
>   |
> |  mem-estimate=57.78KB mem-reservation=0B thread-reservation=0   
>   |
> |  tuple-ids=0 row-size=12B cardinality=4.40K 
>   |
> |  in pipelines: 00(GETNEXT)  
>   |
> | 
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2  
>   |
> | Per-Host Resources: mem-estimate=32.00MB mem-reservation=16.00KB 
> thread-reservation=2 |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]
>   |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0
>   |
> |   00:SCAN HDFS [functional_parquet.complextypestbl t, RANDOM]   
>   |
> |  HDFS partitions=1/1 files=2 size=6.92KB
>   |
> |  stored statistics: 
>   |
> |table: rows=unavailable size=unavailable 
>   |
> |columns missing stats: id
>   |
> |  extrapolated-rows=disabled max-scan-range-rows=unavailable 
>   |
> |  mem-estimate=32.00MB mem-reservation=16.00KB thread-reservation=1  
>   |
> |  tuple-ids=0 row-size=12B cardinality=4.40K 
>   |
> |  in pipelines: 00(GETNEXT)  
>   |
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-9332) Investigate and use the new batch listing API from HDFS-13616

2020-02-11 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9332:
---

Assignee: Quanlong Huang  (was: Vihang Karajgaonkar)

> Investigate and use the new batch listing API from HDFS-13616
> -
>
> Key: IMPALA-9332
> URL: https://issues.apache.org/jira/browse/IMPALA-9332
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Quanlong Huang
>Priority: Critical
>
> HDFS-13616 provides a new batch listing API which can potentially speed up 
> the file listing on HDFS tables when reloading the table file metadata. We 
> should investigate if this API is helpful for Impala and use it if there are 
> any performance benefits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8870) Bump guava version when building against Hive 3

2020-02-10 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8870:
---

Assignee: Fang-Yu Rao  (was: Vihang Karajgaonkar)

> Bump guava version when building against Hive 3
> ---
>
> Key: IMPALA-8870
> URL: https://issues.apache.org/jira/browse/IMPALA-8870
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Fang-Yu Rao
>Priority: Blocker
>
> Guava is pinned to 14.01 
> https://github.com/apache/impala/blob/8094811/impala-parent/pom.xml#L59
> {code}
> 
> 14.0.1
> {code}
> I think this has likely changed in Hive 3 and we probably want to revisit 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9350) Ranger audits for column masking not produced

2020-01-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9350:
---

Assignee: Fang-Yu Rao  (was: Dinesh Garg)

> Ranger audits for column masking not produced
> -
>
> Key: IMPALA-9350
> URL: https://issues.apache.org/jira/browse/IMPALA-9350
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Fang-Yu Rao
>Priority: Critical
> Attachments: Ranger Audit Events.png
>
>
> The audits for applying Ranger column masking policies are missing.
> Here are audit events for a query "SELECT * FROM default.sample_07" executed 
> by Hive and Impala.
> !Ranger Audit Events.png|width=1259,height=327!
> Policy 37 is a column masking policy on table default.sample_07. We should 
> produce the audit event when it's applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9350) Ranger audits for column masking not produced

2020-01-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9350 started by Dinesh Garg.
---
> Ranger audits for column masking not produced
> -
>
> Key: IMPALA-9350
> URL: https://issues.apache.org/jira/browse/IMPALA-9350
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Dinesh Garg
>Priority: Critical
> Attachments: Ranger Audit Events.png
>
>
> The audits for applying Ranger column masking policies are missing.
> Here are audit events for a query "SELECT * FROM default.sample_07" executed 
> by Hive and Impala.
> !Ranger Audit Events.png|width=1259,height=327!
> Policy 37 is a column masking policy on table default.sample_07. We should 
> produce the audit event when it's applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9350) Ranger audits for column masking not produced

2020-01-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9350:
---

Assignee: Fang-Yu Rao  (was: Quanlong Huang)

> Ranger audits for column masking not produced
> -
>
> Key: IMPALA-9350
> URL: https://issues.apache.org/jira/browse/IMPALA-9350
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> The audits for applying Ranger column masking policies are missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8632) Add support for self-event detection for insert events

2020-01-21 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8632:
---

Assignee: Xiaomeng Zhang  (was: Vihang Karajgaonkar)

> Add support for self-event detection for insert events
> --
>
> Key: IMPALA-8632
> URL: https://issues.apache.org/jira/browse/IMPALA-8632
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Xiaomeng Zhang
>Priority: Critical
>
> In case of {{INSERT_EVENTS}} if Impala inserts into a table it causes a 
> refresh to the underlying table/partition. This could be unnecessary when 
> there is only one Impala cluster in the system. The existing self-event 
> detection framework cannot identify such events because they are not sending 
> HMS objects like tables and partitions to the HMS. Instead in case of 
> {{INSERT_EVENT}} HMS API only asks for a table name or partition value to 
> fire a insert event on it. 
> We can detect a self-event in such cases if the HMS API to fire a listener 
> event is improved to return the event id. This would be used by 
> EventProcessor to ignore the event when it is fetched later in the next 
> polling cycle. In order to support this, we will need to make a change to 
> Hive as well so that the enhanced API can be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8444) Analysis perf regression after IMPALA-7616

2019-12-17 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8444:

Description: 
* The patch for IMPALA-7616 caused a performance regression in analysis time 
when run in an environment with ~1k roles and 10.5k. privileges. The regression 
is evident when run as a role that has a large number of privileges.

Following is the stack to look for when jstacking the coordnator.

{noformat}
"Thread-21" #49 prio=5 os_prio=0 tid=0x0cd6e000 nid=0x6a3d runnable 
[0x7fa28e4a]
   java.lang.Thread.State: RUNNABLE
at java.lang.String.toLowerCase(String.java:2670)
at 
org.apache.impala.catalog.PrincipalPrivilege.buildPrivilegeName(PrincipalPrivilege.java:82)
at 
org.apache.impala.catalog.PrincipalPrivilege.getName(PrincipalPrivilege.java:143)
at 
org.apache.impala.catalog.AuthorizationPolicy.listPrivileges(AuthorizationPolicy.java:423)
- locked <0x7fa376987100> (a 
org.apache.impala.catalog.AuthorizationPolicy)
at 
org.apache.impala.catalog.AuthorizationPolicy.listPrivileges(AuthorizationPolicy.java:443)
- locked <0x7fa376987100> (a 
org.apache.impala.catalog.AuthorizationPolicy)
at 
org.apache.sentry.provider.cache.SimpleCacheProviderBackend.getPrivileges(SimpleCacheProviderBackend.java:75)
at 
org.apache.sentry.policy.db.SimpleDBPolicyEngine.getPrivileges(SimpleDBPolicyEngine.java:98)
at 
org.apache.sentry.provider.common.ResourceAuthorizationProvider.getPrivileges(ResourceAuthorizationProvider.java:147)
at 
org.apache.sentry.provider.common.ResourceAuthorizationProvider.doHasAccess(ResourceAuthorizationProvider.java:120)
at 
org.apache.sentry.provider.common.ResourceAuthorizationProvider.hasAccess(ResourceAuthorizationProvider.java:107)
at 
org.apache.impala.authorization.AuthorizationChecker.hasAccess(AuthorizationChecker.java:215)
at 
org.apache.impala.authorization.AuthorizationChecker.checkAccess(AuthorizationChecker.java:128)
at 
org.apache.impala.analysis.AnalysisContext.authorizePrivilegeRequest(AnalysisContext.java:592)
at 
org.apache.impala.analysis.AnalysisContext.authorize(AnalysisContext.java:564)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:415)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1240)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1210)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1182)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:158)
{noformat}

Issue worsens when running concurrent workloads, because the underlying Sentry 
{{listPrivileges()}} is synchronized and that serializes all the query analysis 
requests.

{noformat}
  public synchronized Set listPrivileges(Set groups,  <
  ActiveRoleSet roleSet) {
Set privileges = Sets.newHashSet();
if (roleSet != ActiveRoleSet.ALL) {
  throw new UnsupportedOperationException("Impala does not support role 
subsets.");
}
{noformat}

Notes:
- If the authorization metadata footprint is small, this issue can be ignored.
- One workaround is to run the query using a role that has very small number of 
privileges (revoke privileges that are not necessary to run a given query).
- Another workaround is to disable Sentry authorization.

  was:
The patch for IMPALA-7616 caused a performance regression in analysis time when 
run in an environment with ~1k roles and 10.5k. privileges. The regression is 
evident when run as a role that has a large number of privileges.

Following is the stack to look for when jstacking the coordnator.

{noformat}
"Thread-21" #49 prio=5 os_prio=0 tid=0x0cd6e000 nid=0x6a3d runnable 
[0x7fa28e4a]
   java.lang.Thread.State: RUNNABLE
at java.lang.String.toLowerCase(String.java:2670)
at 
org.apache.impala.catalog.PrincipalPrivilege.buildPrivilegeName(PrincipalPrivilege.java:82)
at 
org.apache.impala.catalog.PrincipalPrivilege.getName(PrincipalPrivilege.java:143)
at 
org.apache.impala.catalog.AuthorizationPolicy.listPrivileges(AuthorizationPolicy.java:423)
- locked <0x7fa376987100> (a 
org.apache.impala.catalog.AuthorizationPolicy)
at 
org.apache.impala.catalog.AuthorizationPolicy.listPrivileges(AuthorizationPolicy.java:443)
- locked <0x7fa376987100> (a 
org.apache.impala.catalog.AuthorizationPolicy)
at 
org.apache.sentry.provider.cache.SimpleCacheProviderBackend.getPrivileges(SimpleCacheProviderBackend.java:75)
at 
org.apache.sentry.policy.db.SimpleDBPolicyEngine.getPrivileges(SimpleDBPolicyEngine.java:98)
at 
org.apache.sentry.provider.common.ResourceAuthorizationProvider.getPrivileges(ResourceAuthorizationProvider.java:147)
at 

[jira] [Updated] (IMPALA-9072) Advanced features and write support in ORC support

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9072:

Summary: Advanced features and write support in ORC support  (was: Advanced 
features in ORC support)

> Advanced features and write support in ORC support
> --
>
> Key: IMPALA-9072
> URL: https://issues.apache.org/jira/browse/IMPALA-9072
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Quanlong Huang
>Priority: Major
>
> Support full functionality for read/write ORC file format tables. JIRAs in 
> this epic may have lower priority unless they're highly voted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9175) Revisit the error handling logics in ORC scanner

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9175:

Priority: Critical  (was: Major)

> Revisit the error handling logics in ORC scanner
> 
>
> Key: IMPALA-9175
> URL: https://issues.apache.org/jira/browse/IMPALA-9175
> Project: IMPALA
>  Issue Type: Task
>Reporter: Quanlong Huang
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6772) Enable test_scanners_fuzz for ORC format

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-6772:

Priority: Critical  (was: Major)

> Enable test_scanners_fuzz for ORC format
> 
>
> Key: IMPALA-6772
> URL: https://issues.apache.org/jira/browse/IMPALA-6772
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Currently, we haven't enabled test_scanner_fuzz for ORC yet, since the ORC 
> library (release-1.4.3) is not robust for corrupt files (ORC-315). We should 
> enable it after a new version of the ORC library is released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9074) Add support for zstd in ORC

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9074:

Priority: Critical  (was: Major)

> Add support for zstd in ORC
> ---
>
> Key: IMPALA-9074
> URL: https://issues.apache.org/jira/browse/IMPALA-9074
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: id_name_zstd.orc
>
>
> The ORC lib already supports reading/writing to zstd compressed ORC files. 
> However, I failed in a quick try in Impala:
> {code:sql}
> hive> create table orc_zstd (id int, name string) stored as orc;
> $ hdfs dfs -put id_name_zstd.orc 
> hdfs://localhost:20500/test-warehouse/orc_zstd
> impala-shell> invalidate metadata orc_zstd;
> impala-shell> select * from orc_zstd;
> ERROR: Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_zstd/id_name_zstd.orc: Unknown 
> compression codec 5
> {code}
> The ORC file is generated by the csv-import tool: 
> https://github.com/apache/orc/blob/rel/release-1.6.0/tools/src/CSVFileImport.cc
> (Manually changing the compression from ZLIB to ZSTD in it)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6943) ORC support with full functionality

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-6943:

Priority: Critical  (was: Major)

> ORC support with full functionality
> ---
>
> Key: IMPALA-6943
> URL: https://issues.apache.org/jira/browse/IMPALA-6943
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Quanlong Huang
>Priority: Critical
>
> Support basic functionality for reading ORC file format tables including 
> stability works for edge cases. This is the first milestone to make ORC 
> support GA. Other advanced features will be tracked in IMPALA-9072.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9174) Revisit the memory pattern of the ORC scanner

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9174:

Priority: Critical  (was: Major)

> Revisit the memory pattern of the ORC scanner
> -
>
> Key: IMPALA-9174
> URL: https://issues.apache.org/jira/browse/IMPALA-9174
> Project: IMPALA
>  Issue Type: Task
>Reporter: Quanlong Huang
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8184) Add timestamp validation to Orc scanner

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8184:

Priority: Critical  (was: Minor)

> Add timestamp validation to Orc scanner
> ---
>
> Key: IMPALA-8184
> URL: https://issues.apache.org/jira/browse/IMPALA-8184
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> Similarly to Parquet, Orc can also contain timestamps that are not valid in 
> Impala, e.g. Hive can insert timestamps before 1400 while these are invalid 
> in Impala. These invalid timestamps are often handled similarly to NULL, bur 
> are actually not "real" NULLs, which can lead to some some weird behavior:
> Hive:
> create table orcts (ts timestamp) stored as orc;
> insert into orcts values ("1200-01-01");
> Impala:
> select * from orcts where ts is not null;
> Returns 1 row:
> NULL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8046) Support CREATE TABLE from an ORC file

2019-11-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8046:

Priority: Critical  (was: Major)

> Support CREATE TABLE from an ORC file
> -
>
> Key: IMPALA-8046
> URL: https://issues.apache.org/jira/browse/IMPALA-8046
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Critical
>
> Impala supports creating a table using the schema of a file. However, only 
> parquet is supported currently. This ticket tracks adding support for 
> creating table from ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9009) Core support for column mask transformation in select list

2019-11-17 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9009:
---

Assignee: Quanlong Huang  (was: Dinesh Garg)

> Core support for column mask transformation in select list
> --
>
> Key: IMPALA-9009
> URL: https://issues.apache.org/jira/browse/IMPALA-9009
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Assignee: Quanlong Huang
>Priority: Critical
>
> Identify masked columns from SELECT list.
> Support custom (user supplied) mask SQL from Ranger.
> Parse column mask expressions and substitute into original statement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9009) Core support for column mask transformation in select list

2019-11-17 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9009 started by Dinesh Garg.
---
> Core support for column mask transformation in select list
> --
>
> Key: IMPALA-9009
> URL: https://issues.apache.org/jira/browse/IMPALA-9009
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Assignee: Dinesh Garg
>Priority: Critical
>
> Identify masked columns from SELECT list.
> Support custom (user supplied) mask SQL from Ranger.
> Parse column mask expressions and substitute into original statement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9010) Support pre-defined mask types from Ranger UI

2019-11-13 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9010:
---

Assignee: Fang-Yu Rao

> Support pre-defined mask types from Ranger UI
> -
>
> Key: IMPALA-9010
> URL: https://issues.apache.org/jira/browse/IMPALA-9010
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> Review Hive implementation/behavior.
> Redact/Partial/Hash/Nullify/Unmasked/Date
>  These will be implemented as static SQL transforms in Impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9110) Add table loading time break-down metrics for HdfsTable

2019-11-05 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9110 started by Dinesh Garg.
---
> Add table loading time break-down metrics for HdfsTable
> ---
>
> Key: IMPALA-9110
> URL: https://issues.apache.org/jira/browse/IMPALA-9110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Dinesh Garg
>Priority: Critical
>
> We are only able to get total table loading time right now, which makes it 
> really hard for us to debug why sometimes table loading is slow. Therefore, 
> it would be good to have a break-down metrics on how much time each function 
> cost when loading tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9013) Column Masking DML support

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9013:

Priority: Critical  (was: Major)

> Column Masking DML support
> --
>
> Key: IMPALA-9013
> URL: https://issues.apache.org/jira/browse/IMPALA-9013
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Review Hive implementation to see if anything special needs to be done for 
> DML. The Hive column masking design doc does not reflect the current code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9011) Support column masking on CTEs, views, and derived column names

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9011:

Priority: Critical  (was: Major)

> Support column masking on CTEs, views, and derived column names
> ---
>
> Key: IMPALA-9011
> URL: https://issues.apache.org/jira/browse/IMPALA-9011
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> CTE/views: dig out underlying column and table names
>  derived column names i.e. select * from (select 1) as foo - Handle 
> appropriately.
> Also negative cases where the query has an invalid reference. i.e.
> WITH foo AS (SELECT c1 FROM t1) SELECT c1 FROM FOO;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9012) Allow access to columns with column masks and update tests

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9012:

Priority: Critical  (was: Major)

> Allow access to columns with column masks and update tests
> --
>
> Key: IMPALA-9012
> URL: https://issues.apache.org/jira/browse/IMPALA-9012
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Remove check in RangerAuthorizationChecker::authorizeTableAccess
> Remove testcase in RangerAuditLogTest.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9010) Support pre-defined mask types from Ranger UI

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9010:

Priority: Critical  (was: Major)

> Support pre-defined mask types from Ranger UI
> -
>
> Key: IMPALA-9010
> URL: https://issues.apache.org/jira/browse/IMPALA-9010
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Review Hive implementation/behavior.
> Redact/Partial/Hash/Nullify/Unmasked/Date
>  These will be implemented as static SQL transforms in Impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9079) Add Auth Interfaces to retrieve column masks and implement for Ranger

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9079:

Priority: Critical  (was: Major)

> Add Auth Interfaces to retrieve column masks and implement for Ranger
> -
>
> Key: IMPALA-9079
> URL: https://issues.apache.org/jira/browse/IMPALA-9079
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Kurt Deschler
>Priority: Critical
>
> Masks definitions can be retrieved from the ranger plugin. Analyzer has 
> access to AuthorizationFactory via Analyzer::getAuthzFactory(). There are 
> currently no interfaces through AuthorizationFactory or AuthorizationChecker 
> to access the column masks from the plugin. These will need to be added and 
> then implemented for the Ranger plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9009) Core support for column mask transformation in select list

2019-10-31 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9009:

Priority: Critical  (was: Major)

> Core support for column mask transformation in select list
> --
>
> Key: IMPALA-9009
> URL: https://issues.apache.org/jira/browse/IMPALA-9009
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Kurt Deschler
>Priority: Critical
>
> Identify masked columns from SELECT list.
> Support custom (user supplied) mask SQL from Ranger.
> Parse column mask expressions and substitute into original statement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-30 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9108:
---

Assignee: Tim Armstrong  (was: Dinesh Garg)

> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-30 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-9108:

Priority: Critical  (was: Major)

> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9108) Unused leveldbjni dependency triggers some security scanners

2019-10-30 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9108 started by Dinesh Garg.
---
> Unused leveldbjni dependency triggers some security scanners
> 
>
> Key: IMPALA-9108
> URL: https://issues.apache.org/jira/browse/IMPALA-9108
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Dinesh Garg
>Priority: Critical
>
> A windows dll in leveldbjni-all-1.8.jar is flagged by some security scanners. 
> We shouldn't have a dependency on leveldb, so we should exclude this and not 
> pull in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9092) Fix "show create table" tests on USE_CDP_HIVE=true to account for HIVE-22158

2019-10-28 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-9092:
---

Assignee: Vihang Karajgaonkar

> Fix "show create table" tests on USE_CDP_HIVE=true to account for HIVE-22158
> 
>
> Key: IMPALA-9092
> URL: https://issues.apache.org/jira/browse/IMPALA-9092
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
>
> Hive changed behavior with HIVE-22158 so that only transactional tables are 
> considered managed and all other are considered external. This means that a 
> regular "create table" will result in an external table with table properties 
> of 'TRANSLATED_TO_EXTERNAL'='TRUE', 'external.table.purge'='TRUE'. This 
> breaks our tests that rely on "show create table", because the table is newly 
> external and has extra table properties. For example:
> {noformat}
> query_test/test_kudu.py:842: in test_primary_key_and_distribution
> db=cursor.conn.db_name, kudu_addr=KUDU_MASTER_HOSTS))
> query_test/test_kudu.py:824: in assert_show_create_equals
> assert cursor.fetchall()[0][0] == \
> E   assert "CREATE EXTER...='localhost')" == "CREATE TABLE ...='localhost')"
> E - CREATE EXTERNAL TABLE testshowcreatetable_15312_ggn1hk.nvbpxfuxze
> E ?-
> E + CREATE TABLE testshowcreatetable_15312_ggn1hk.nvbpxfuxze (
> E ? ++
> E +   c INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
> E +   PRIMARY KEY (c)
> E + )
> E + PARTITION BY HASH (c) PARTITIONS 3
> E   STORED AS KUDU
> E - TBLPROPERTIES ('TRANSLATED_TO_EXTERNAL'='TRUE', 
> 'external.table.purge'='TRUE', 'kudu.master_addresses'='localhost')
> E + TBLPROPERTIES ('kudu.master_addresses'='localhost'){noformat}
> We need to decide on the right behavior for "show create table" and update 
> the tests. 
> For Kudu tables, tables with TRANSLATED_TO_EXTERNAL=true and 
> external.table.purge=TRUE should be equivalent to a non-external Kudu table, 
> and we can just detect this case and generate the same SQL as before.
> Other cases may need new logic. I think it makes sense to also address other 
> tests due to MANAGED vs EXTERNAL distinction or extra table properties with 
> this JIRA. Here is a list of tests that seem to have this problem:
> {noformat}
> metadata/test_ddl.py TestDdlStatements.test_create_alter_tbl_properties
> metadata/test_show_create_table.py *
> query_test/test_kudu.py TestShowCreateTable*
> org.apache.impala.catalog.CatalogTest.testCreateTableMetadata
> org.apache.impala.catalog.local.LocalCatalogTest.testKuduTable{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8937) Fine grained table metadata loading on Catalog server

2019-10-10 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8937:

Priority: Critical  (was: Major)

> Fine grained table metadata loading on Catalog server
> -
>
> Key: IMPALA-8937
> URL: https://issues.apache.org/jira/browse/IMPALA-8937
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.12.0, Impala 3.3.0
>Reporter: Bharath Vissapragada
>Priority: Critical
>
> *Background*:
> Currently the table _on the Catalog server_ is either in a loaded or unloaded 
> state (IncompleteTable). When Catalog server starts for the first time, we 
> first fetch a list of table names for each databases and every table in this 
> list starts as an unloaded table. The table lists are propagated to the 
> coordinators so that they know whether a table with a given name exists or 
> not and they can start analyzing the queries. No metadata is loaded in the 
> incomplete tables (like schema/ownership, comments etc.)
> The table metadata is loaded lazily (and the table moves into a loaded state) 
> when it is referenced in any query. When a load request comes in, all the 
> table metadata is loaded including file block information. 
> *Problem:* 
> Coordinators need some additional information when analyzing unloaded tables. 
> For example: IMPALA-8228. The ownership information is a part of the HMS 
> table schema which is not loaded until the table is marked fully loaded. 
> While this is not a problem for regular queries (like select * from ), 
> it is an issue with queries like "show tables" which do not trigger a table 
> load. In this particular case, due to the lack of ownership information, the 
> output of the table listing could be different depending on whether the table 
> is loaded. Another example is IMPALA-8606 where the GET_TABLES request does 
> not return the table comments because they are not available for unloaded 
> tables.
> *Ask:*
> We need to consider finer grained loading on the Catalog server in general. 
> Instead of having a binary state (loaded vs unloaded), the table could be in 
> a partially loaded state. We could also start with aggressively fetching 
> certain pieces of information that we think could aid with analysis and 
> lazily load the remaining pieces of metadata. Finer grained loading also 
> integrates well with the LocalCatalog implementation on the coordinators 
> where the the entire table need not be loaded on the Catalog server to serve 
> partial meta information (e.g: show partitions ).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4025) add functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()

2019-10-07 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-4025:

Priority: Critical  (was: Major)

> add functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()
> 
>
> Key: IMPALA-4025
> URL: https://issues.apache.org/jira/browse/IMPALA-4025
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Yongzhi Chen
>Priority: Critical
>  Labels: built-in-function, sql-language
>
> Add the following functions as both an aggregate function and window/analytic 
> function:
> * PERCENTILE_CONT
> * PERCENTILE_DISC
> * MEDIAN (impmented as PERCENTILE_CONT(0.5))
> h6. Syntax
> {code}
> PERCENTILE_CONT() WITHIN GROUP (ORDER BY  [ASC|DESC] 
> [NULLS {FIRST | LAST}]) [ OVER ([])]
> PERCENTILE_DISC() WITHIN GROUP (ORDER BY  [ASC|DESC] 
> [NULLS {FIRST | LAST}]) [ OVER ([])]
> MEDIAN(expr) [ OVER () ]
> {code}
> h6. Notes from other systems
> *Greenplum*
> {code}
> PERCENTILE_CONT(_percentage_) WITHIN GROUP (ORDER BY _expression_)
> {code}
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html
> Greenplum Database provides the MEDIAN aggregate function, which returns the 
> fiftieth percentile of the PERCENTILE_CONT result and special aggregate 
> expressions for inverse distribution functions as follows:
> Currently you can use only these two expressions with the keyword WITHIN 
> GROUP.
> Note: aggregation fuction only
> *Oracle*
> {code}
> PERCENTILE_CONT(expr) WITHIN GROUP (ORDER BY expr [ DESC | ASC ]) [ OVER 
> (query_partition_clause) ]}}
> {code}
> http://docs.oracle.com/database/121/SQLRF/functions141.htm#SQLRF00687
> Note: implemented as both an aggregate and window function
> *Vertica*
> {code}
> PERCENTILE_CONT ( %_number ) WITHIN GROUP (... ORDER BY expression [ ASC | 
> DESC ] ) OVER (... [ window-partition-clause ] )
> {code}
> https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm
> Note: window fuction only
> *Teradata*
> {code}
> PERCENTILE_CONT() WITHIN GROUP (ORDER BY  
> [asc | desc] [nulls {first | last}])
> {code}
> Note: aggregation fuction only
> *Netezza*
> {code}
> SELECT fn() WITHIN GROUP (ORDER BY  [asc|desc] [nulls 
> {first | last}]) FROM [GROUP BY ];
> {code}
> https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_inverse_distribution_funcs_family_syntax.html
> Note: aggregation fuction only
> *Redshift*
> {code}
> PERCENTILE_CONT ( percentile ) WITHIN GROUP (ORDER BY expr) OVER (  [ 
> PARTITION BY expr_list ]  )
> {code}
> https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_inverse_distribution_funcs_family_syntax.html
> Note: window fuction only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8954) Support uncorrelated subqueries in the select list

2019-10-07 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8954:
---

Assignee: Shant Hovsepian

> Support uncorrelated subqueries in the select list
> --
>
> Key: IMPALA-8954
> URL: https://issues.apache.org/jira/browse/IMPALA-8954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Shant Hovsepian
>Priority: Critical
>  Labels: tpc-ds
>
> {noformat}
> [localhost:21000] default> select 'foo', (select 'bar');
> Query: select 'foo', (select 'bar')
> Query submitted at: 2019-09-18 13:44:43 (Coordinator: 
> http://tarmstrong-box:25000)
> ERROR: AnalysisException: Subqueries are not supported in the select list.
> {noformat}
> I think we can support these, implemented as a nested loop join with a 
> cardinality check node if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3531) Implement deferrable and optionally enforced PK/FK constraints

2019-10-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-3531:

Priority: Critical  (was: Minor)

> Implement deferrable and optionally enforced PK/FK constraints
> --
>
> Key: IMPALA-3531
> URL: https://issues.apache.org/jira/browse/IMPALA-3531
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend, Perf Investigation
>Affects Versions: Impala 2.5.0, Impala 2.6.0
> Environment: CDH
>Reporter: Ruslan Dautkhanov
>Assignee: Anurag Mantripragada
>Priority: Critical
>  Labels: CBO, performance, ramp-up
>
> Oracle has "RELY NOVALIDATE" option for constraints.. Could be easier for 
> Hive to start with something like that for PK/FK constraints. So CBO has more 
> information for optimizations. It does not have to actually check if that 
> constraint is relationship is true; it can just "rely" on that constraint.
> https://docs.oracle.com/database/121/SQLRF/clauses002.htm#sthref2289
> So it would be helpful with join cardinality estimates, and with cases like 
> IMPALA-2929.
> https://docs.oracle.com/database/121/DWHSG/schemas.htm#DWHSG9053
> "Overview of Constraint States":
> - Enforcement
> - Validation
> - Belief
> So FK/PK with "rely novalidate" will have Enforcement disabled but 
> Belief = RELY as it is possible to do in Oracle and now in Hive (HIVE-13076).
> It opens a lot of ways to do additional ways to optimize execution plans.
> As exxplined in Tom Kyte's "Metadata matters"
> http://www.peoug.org/wp-content/uploads/2009/12/MetadataMatters_PEOUG_Day2009_TKyte.pdf
> pp.30 - "Tell us how the tables relate and we can remove them from the 
> plan...".
> pp.35 - "Tell us how the tables relate and we have more access paths 
> available...".
> Also it might be helpful when Impala is being integrated with Kudu as the 
> latter have to have a PK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8291) 'DESCRIBE EXTENDED ..' does not display constraint information

2019-10-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8291:

Priority: Critical  (was: Major)

> 'DESCRIBE EXTENDED ..' does not display constraint information
> --
>
> Key: IMPALA-8291
> URL: https://issues.apache.org/jira/browse/IMPALA-8291
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Anurag Mantripragada
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> Currently, DESCRIBE EXTENDED table_name command does not display constraint 
> information like primary key / Foreign key information for tables created 
> through Hive.
> This work must also be extended to tables created through Impala once we have 
> support for pk/fk in create table syntax.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2112) Support primary key/foreign key constraint as part of create table in Impala

2019-10-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-2112:

Priority: Critical  (was: Minor)

> Support primary key/foreign key constraint as part of create table in Impala
> 
>
> Key: IMPALA-2112
> URL: https://issues.apache.org/jira/browse/IMPALA-2112
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.2
>Reporter: Marcel Kinard
>Assignee: Anurag Mantripragada
>Priority: Critical
>  Labels: planner
>
> These would be advisory, ie, Impala would not attempt to enforce them. 
> However, they could be used for cardinality estimation during query planning.
> To be compatible with Hive:
>  * We neither enforce or validate integrity constraints. Hence, DISABLE and 
> NOVALIDATE options are mandatory.
>  * RELY/NORELY is optional. The CBO is expected to use this information when 
> a user specifies “RELY”. The default is NORELY.
>  * Since we do not yet have UNIQUE in Hive, the FK mentioned must be Primary 
> Key column in parent table.
> Support create table syntax like hive does:
>  * {{create table pk(id1 integer, id2 integer, }}{{primary key(id1, id2) 
> DISABLE NOVALIDATE);}}
>  * {{create table fk(id1 integer, id2 integer, }}{{constraint c1 foreign 
> key(id1, id2) references pk(id2, id1) DISABLE NOVALIDATE);}}
>  * {{create table T1(id integer, name string, primary key(id) DISABLE 
> NOVALIDATE RELY}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8290) Display constraint information in 'SHOW CREATE' statement

2019-10-06 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8290:

Priority: Critical  (was: Minor)

> Display constraint information in 'SHOW CREATE' statement
> -
>
> Key: IMPALA-8290
> URL: https://issues.apache.org/jira/browse/IMPALA-8290
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Anurag Mantripragada
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> Show create statement should display primary key and foreign key information.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8954) Support uncorrelated subqueries in the select list

2019-10-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8954:

Priority: Critical  (was: Major)

> Support uncorrelated subqueries in the select list
> --
>
> Key: IMPALA-8954
> URL: https://issues.apache.org/jira/browse/IMPALA-8954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Priority: Critical
>  Labels: tpc-ds
>
> {noformat}
> [localhost:21000] default> select 'foo', (select 'bar');
> Query: select 'foo', (select 'bar')
> Query submitted at: 2019-09-18 13:44:43 (Coordinator: 
> http://tarmstrong-box:25000)
> ERROR: AnalysisException: Subqueries are not supported in the select list.
> {noformat}
> I think we can support these, implemented as a nested loop join with a 
> cardinality check node if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-891) Add support for intersect and except set operations

2019-10-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-891:
---
Priority: Critical  (was: Major)

> Add support for intersect and except set operations
> ---
>
> Key: IMPALA-891
> URL: https://issues.apache.org/jira/browse/IMPALA-891
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.4, Impala 2.5.0, Impala 2.6.0, Impala 2.7.0
>Reporter: Jonathan Seidman
>Priority: Critical
>  Labels: sql-language, usability
>
> Set functionality includes the below.  Today, Impala has just {{UNION}} & 
> {{UNION ALL}}.
> {code}
> UNION [DISTINCT]
> UNION ALL
> INTERSECT [DISTINCT]
> INTERSECT ALL
> EXCEPT [DISTINCT]
> EXCEPT ALL
> * MINUS is an alias for EXCEPT
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5098) Correct handling of DISTINCT in the select list

2019-10-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-5098:
---

Assignee: Kurt Deschler

> Correct handling of DISTINCT in the select list
> ---
>
> Key: IMPALA-5098
> URL: https://issues.apache.org/jira/browse/IMPALA-5098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: N Campbell
>Assignee: Kurt Deschler
>Priority: Critical
>  Labels: ansi-sql, sql-language
>
> DB2, ORACLE and various other systems will support the following statement 
> but Impala will not
> {noformat}
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, 
> SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000,
> errorMessage:AnalysisException: cannot combine SELECT DISTINCT with analytic 
> functions
> ), Query: SELECT DISTINCT 
> `sno` AS `c1`, 
> `pno` AS `c2`, 
> SUM(`qty`)
> OVER(
> ) AS `c3`
> FROM
> `cert`.`tsupply` 
> ORDER BY 
> `sno` ASC NULLS LAST, 
> `pno` ASC NULLS LAST.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5098) Correct handling of DISTINCT in the select list

2019-10-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-5098:

Priority: Critical  (was: Major)

> Correct handling of DISTINCT in the select list
> ---
>
> Key: IMPALA-5098
> URL: https://issues.apache.org/jira/browse/IMPALA-5098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: N Campbell
>Priority: Critical
>  Labels: ansi-sql, sql-language
>
> DB2, ORACLE and various other systems will support the following statement 
> but Impala will not
> {noformat}
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, 
> SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000,
> errorMessage:AnalysisException: cannot combine SELECT DISTINCT with analytic 
> functions
> ), Query: SELECT DISTINCT 
> `sno` AS `c1`, 
> `pno` AS `c2`, 
> SUM(`qty`)
> OVER(
> ) AS `c3`
> FROM
> `cert`.`tsupply` 
> ORDER BY 
> `sno` ASC NULLS LAST, 
> `pno` ASC NULLS LAST.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4025) add functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()

2019-10-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-4025:
---

Assignee: Yongzhi Chen  (was: Tianyi Wang)

> add functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()
> 
>
> Key: IMPALA-4025
> URL: https://issues.apache.org/jira/browse/IMPALA-4025
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: built-in-function, sql-language
>
> Add the following functions as both an aggregate function and window/analytic 
> function:
> * PERCENTILE_CONT
> * PERCENTILE_DISC
> * MEDIAN (impmented as PERCENTILE_CONT(0.5))
> h6. Syntax
> {code}
> PERCENTILE_CONT() WITHIN GROUP (ORDER BY  [ASC|DESC] 
> [NULLS {FIRST | LAST}]) [ OVER ([])]
> PERCENTILE_DISC() WITHIN GROUP (ORDER BY  [ASC|DESC] 
> [NULLS {FIRST | LAST}]) [ OVER ([])]
> MEDIAN(expr) [ OVER () ]
> {code}
> h6. Notes from other systems
> *Greenplum*
> {code}
> PERCENTILE_CONT(_percentage_) WITHIN GROUP (ORDER BY _expression_)
> {code}
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html
> Greenplum Database provides the MEDIAN aggregate function, which returns the 
> fiftieth percentile of the PERCENTILE_CONT result and special aggregate 
> expressions for inverse distribution functions as follows:
> Currently you can use only these two expressions with the keyword WITHIN 
> GROUP.
> Note: aggregation fuction only
> *Oracle*
> {code}
> PERCENTILE_CONT(expr) WITHIN GROUP (ORDER BY expr [ DESC | ASC ]) [ OVER 
> (query_partition_clause) ]}}
> {code}
> http://docs.oracle.com/database/121/SQLRF/functions141.htm#SQLRF00687
> Note: implemented as both an aggregate and window function
> *Vertica*
> {code}
> PERCENTILE_CONT ( %_number ) WITHIN GROUP (... ORDER BY expression [ ASC | 
> DESC ] ) OVER (... [ window-partition-clause ] )
> {code}
> https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm
> Note: window fuction only
> *Teradata*
> {code}
> PERCENTILE_CONT() WITHIN GROUP (ORDER BY  
> [asc | desc] [nulls {first | last}])
> {code}
> Note: aggregation fuction only
> *Netezza*
> {code}
> SELECT fn() WITHIN GROUP (ORDER BY  [asc|desc] [nulls 
> {first | last}]) FROM [GROUP BY ];
> {code}
> https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_inverse_distribution_funcs_family_syntax.html
> Note: aggregation fuction only
> *Redshift*
> {code}
> PERCENTILE_CONT ( percentile ) WITHIN GROUP (ORDER BY expr) OVER (  [ 
> PARTITION BY expr_list ]  )
> {code}
> https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_inverse_distribution_funcs_family_syntax.html
> Note: window fuction only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8981) Support column masking in Impala

2019-10-01 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8981:

Priority: Critical  (was: Major)

> Support column masking in Impala
> 
>
> Key: IMPALA-8981
> URL: https://issues.apache.org/jira/browse/IMPALA-8981
> Project: IMPALA
>  Issue Type: New Feature
>Affects Versions: Impala 3.4.0
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Critical
>
> Related Hive Jira https://issues.apache.org/jira/browse/HIVE-13125



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8994) Support Row Filtering in Impala

2019-10-01 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8994:

Priority: Critical  (was: Major)

> Support Row Filtering in Impala
> ---
>
> Key: IMPALA-8994
> URL: https://issues.apache.org/jira/browse/IMPALA-8994
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> Related Hive Jira https://issues.apache.org/jira/browse/HIVE-13125



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5049) Switch Parquet timestamp format to INT64

2019-09-29 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-5049:

Priority: Critical  (was: Major)

> Switch Parquet timestamp format to INT64
> 
>
> Key: IMPALA-5049
> URL: https://issues.apache.org/jira/browse/IMPALA-5049
> Project: IMPALA
>  Issue Type: Epic
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Lars Volker
>Priority: Critical
>  Labels: parquet
>
> We currently use INT96 to store Timestamp values in Parquet files which will 
> be deprecated in 
> [PARQUET-323|https://issues.apache.org/jira/browse/PARQUET-323].
> We need to add read and write support for INT64-based logical types 
> (TIMESTAMP_MILLIS, TIMESTAMP_MICROS) to our Parquet scanner and writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad

2019-09-18 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7506:
---

Assignee: Quanlong Huang  (was: Dinesh Garg)

> Support global INVALIDATE METADATA on fetch-on-demand impalad
> -
>
> Key: IMPALA-7506
> URL: https://issues.apache.org/jira/browse/IMPALA-7506
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-v2
>
> There is some complexity with how this is implemented in the original code: 
> it depends on maintaining the minimum version of any object in the impalad's 
> local cache. We can't determine that in an on-demand impalad, so INVALIDATE 
> METADATA is not supported currently on "fetch-on-demand".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad

2019-09-18 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7506 started by Dinesh Garg.
---
> Support global INVALIDATE METADATA on fetch-on-demand impalad
> -
>
> Key: IMPALA-7506
> URL: https://issues.apache.org/jira/browse/IMPALA-7506
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Assignee: Dinesh Garg
>Priority: Major
>  Labels: catalog-v2
>
> There is some complexity with how this is implemented in the original code: 
> it depends on maintaining the minimum version of any object in the impalad's 
> local cache. We can't determine that in an on-demand impalad, so INVALIDATE 
> METADATA is not supported currently on "fetch-on-demand".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger

2019-09-12 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8587:
---

Assignee: Fang-Yu Rao  (was: Austin Nobis)

> Show inherited privileges in show grant w/ Ranger
> -
>
> Key: IMPALA-8587
> URL: https://issues.apache.org/jira/browse/IMPALA-8587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Austin Nobis
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> If an admin has privileges from:
> *grant all on server to user admin;*
>  
> Currently the command below will show no results:
> *show grant user admin on database functional;*
>  
> After the change, the user should see server level privileges from:
> *show grant user admin on database functional;*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8877) CatalogException during stress test: Table modified while operation was in progress

2019-09-04 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8877:
---

Assignee: Anurag Mantripragada  (was: Vihang Karajgaonkar)

> CatalogException during stress test: Table  modified while operation was 
> in progress
> -
>
> Key: IMPALA-8877
> URL: https://issues.apache.org/jira/browse/IMPALA-8877
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.3.0
>Reporter: David Knupp
>Assignee: Anurag Mantripragada
>Priority: Critical
>  Labels: catalog-v2
> Attachments: catalogd.INFO.tar.gz, impalad.INFO.tar.gz
>
>
> This was hit while running the stress tests to get a baseline on a deployed 
> cluster.
> /* Mem: 12850 MB. Coordinator: quasar-mzmnbe-6.vpc.cloudera.com. */
> COMPUTE STATS catalog_sales
> {noformat}
> Query (id=924a50178a5a6146:29d58a73)
>   Summary
> Session ID: 5543fb9029e2b71f:f446381b1f59ed81
> Session Type: HIVESERVER2
> HiveServer2 Protocol Version: V6
> Start Time: 2019-08-19 01:26:07.292866000
> End Time: 2019-08-19 01:26:27.248053000
> Query Type: DDL
> Query State: EXCEPTION
> Query Status: CatalogException: Table 
> 'tpcds_300_decimal_parquet.catalog_sales' was modified while operation was in 
> progress, aborting execution.
> Impala Version: impalad version 3.3.0-SNAPSHOT RELEASE (build 
> df3e7c051e2641524fc53a0cd07c2a14decd55f7)
> User: syst...@vpc.cloudera.com
> Connected User: syst...@vpc.cloudera.com
> Delegated User: 
> Network Address: :::10.65.6.19:39174
> Default Db: tpcds_300_decimal_parquet
> Sql Statement: /* Mem: 12850 MB. Coordinator: 
> quasar-mzmnbe-6.vpc.cloudera.com. */
> COMPUTE STATS catalog_sales
> Coordinator: quasar-mzmnbe-6.vpc.cloudera.com:22000
> Query Options (set by configuration): 
> ABORT_ON_ERROR=1,MEM_LIMIT=13474201600,MT_DOP=4,EXEC_TIME_LIMIT_S=2147483647,TIMEZONE=America/Los_Angeles,DEFAULT_FILE_FORMAT=4,DEFAULT_TRANSACTIONAL_TYPE=1
> Query Options (set by configuration and planner): 
> ABORT_ON_ERROR=1,MEM_LIMIT=13474201600,MT_DOP=4,EXEC_TIME_LIMIT_S=2147483647,TIMEZONE=America/Los_Angeles,DEFAULT_FILE_FORMAT=4,DEFAULT_TRANSACTIONAL_TYPE=1
> DDL Type: COMPUTE_STATS
> Query Compilation
>   Metadata of all 1 tables cached: 5.62s (5622372318)
>   Analysis finished: 5.62s (5622560027)
>   Authorization finished (noop): 5.62s (5622568284)
>   Retried query planning due to inconsistent metadata 7 of 40 times: 
> Catalog object TCatalogObject(type:TABLE, catalog_version:94204, 
> table:TTable(db_name:tpcds_300_decimal_parquet, tbl_name:catalog_sales)) 
> changed version between accesses.: 5.95s (5949859598)
>   Planning finished: 5.95s (5949861145)
> Query Timeline
>   Query submitted: 0ns (0)
>   Planning finished: 5.95s (5950024020)
>   Child queries finished: 17.85s (17849072057)
>   Rows available: 19.82s (19825080035)
>   Unregister query: 19.95s (19955080560)
> Frontend
>   - CatalogFetch.ColumnStats.Misses: 34 (34)
>   - CatalogFetch.ColumnStats.Requests: 34 (34)
>   - CatalogFetch.ColumnStats.Time: 0 (0)
>   - CatalogFetch.Config.Hits: 1 (1)
>   - CatalogFetch.Config.Requests: 1 (1)
>   - CatalogFetch.Config.Time: 0 (0)
>   - CatalogFetch.DatabaseList.Hits: 8 (8)
>   - CatalogFetch.DatabaseList.Requests: 8 (8)
>   - CatalogFetch.DatabaseList.Time: 0 (0)
>   - CatalogFetch.PartitionLists.Misses: 1 (1)
>   - CatalogFetch.PartitionLists.Requests: 1 (1)
>   - CatalogFetch.PartitionLists.Time: 7 (7)
>   - CatalogFetch.Partitions.Hits: 1837 (1837)
>   - CatalogFetch.Partitions.Misses: 1837 (1837)
>   - CatalogFetch.Partitions.Requests: 3674 (3674)
>   - CatalogFetch.Partitions.Time: 325 (325)
>   - CatalogFetch.RPCs.Bytes: 4.7 MiB (4936030)
>   - CatalogFetch.RPCs.Requests: 22 (22)
>   - CatalogFetch.RPCs.Time: 343 (343)
>   - CatalogFetch.TableNames.Hits: 4 (4)
>   - CatalogFetch.TableNames.Misses: 4 (4)
>   - CatalogFetch.TableNames.Requests: 8 (8)
>   - CatalogFetch.TableNames.Time: 0 (0)
>   - CatalogFetch.Tables.Misses: 8 (8)
>   - CatalogFetch.Tables.Requests: 8 (8)
>   - CatalogFetch.Tables.Time: 74 (74)
>   - InactiveTotalTime: 0ns (0)
>   - TotalTime: 0ns (0)
>   ImpalaServer
> - CatalogOpExecTimer: 1.97s (1972007962)
> - ClientFetchWaitTimer: 0ns (0)
> - InactiveTotalTime: 0ns (0)
> - RowMaterializationTimer: 0ns (0)
> - TotalTime: 0ns (0)
>   Child Queries
> Table Stats Query (id=db4821e4aa5bb04d:d4a5ae45)
> Column Stats Query (id=0444367557e3496d:f9435111)
> {noformat}



--

[jira] [Updated] (IMPALA-8877) CatalogException during stress test: Table modified while operation was in progress

2019-08-29 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8877:

Labels: catalog-v2  (was: )

> CatalogException during stress test: Table  modified while operation was 
> in progress
> -
>
> Key: IMPALA-8877
> URL: https://issues.apache.org/jira/browse/IMPALA-8877
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.3.0
>Reporter: David Knupp
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: catalog-v2
> Attachments: catalogd.INFO.tar.gz, impalad.INFO.tar.gz
>
>
> This was hit while running the stress tests to get a baseline on a deployed 
> cluster.
> /* Mem: 12850 MB. Coordinator: quasar-mzmnbe-6.vpc.cloudera.com. */
> COMPUTE STATS catalog_sales
> {noformat}
> Query (id=924a50178a5a6146:29d58a73)
>   Summary
> Session ID: 5543fb9029e2b71f:f446381b1f59ed81
> Session Type: HIVESERVER2
> HiveServer2 Protocol Version: V6
> Start Time: 2019-08-19 01:26:07.292866000
> End Time: 2019-08-19 01:26:27.248053000
> Query Type: DDL
> Query State: EXCEPTION
> Query Status: CatalogException: Table 
> 'tpcds_300_decimal_parquet.catalog_sales' was modified while operation was in 
> progress, aborting execution.
> Impala Version: impalad version 3.3.0-SNAPSHOT RELEASE (build 
> df3e7c051e2641524fc53a0cd07c2a14decd55f7)
> User: syst...@vpc.cloudera.com
> Connected User: syst...@vpc.cloudera.com
> Delegated User: 
> Network Address: :::10.65.6.19:39174
> Default Db: tpcds_300_decimal_parquet
> Sql Statement: /* Mem: 12850 MB. Coordinator: 
> quasar-mzmnbe-6.vpc.cloudera.com. */
> COMPUTE STATS catalog_sales
> Coordinator: quasar-mzmnbe-6.vpc.cloudera.com:22000
> Query Options (set by configuration): 
> ABORT_ON_ERROR=1,MEM_LIMIT=13474201600,MT_DOP=4,EXEC_TIME_LIMIT_S=2147483647,TIMEZONE=America/Los_Angeles,DEFAULT_FILE_FORMAT=4,DEFAULT_TRANSACTIONAL_TYPE=1
> Query Options (set by configuration and planner): 
> ABORT_ON_ERROR=1,MEM_LIMIT=13474201600,MT_DOP=4,EXEC_TIME_LIMIT_S=2147483647,TIMEZONE=America/Los_Angeles,DEFAULT_FILE_FORMAT=4,DEFAULT_TRANSACTIONAL_TYPE=1
> DDL Type: COMPUTE_STATS
> Query Compilation
>   Metadata of all 1 tables cached: 5.62s (5622372318)
>   Analysis finished: 5.62s (5622560027)
>   Authorization finished (noop): 5.62s (5622568284)
>   Retried query planning due to inconsistent metadata 7 of 40 times: 
> Catalog object TCatalogObject(type:TABLE, catalog_version:94204, 
> table:TTable(db_name:tpcds_300_decimal_parquet, tbl_name:catalog_sales)) 
> changed version between accesses.: 5.95s (5949859598)
>   Planning finished: 5.95s (5949861145)
> Query Timeline
>   Query submitted: 0ns (0)
>   Planning finished: 5.95s (5950024020)
>   Child queries finished: 17.85s (17849072057)
>   Rows available: 19.82s (19825080035)
>   Unregister query: 19.95s (19955080560)
> Frontend
>   - CatalogFetch.ColumnStats.Misses: 34 (34)
>   - CatalogFetch.ColumnStats.Requests: 34 (34)
>   - CatalogFetch.ColumnStats.Time: 0 (0)
>   - CatalogFetch.Config.Hits: 1 (1)
>   - CatalogFetch.Config.Requests: 1 (1)
>   - CatalogFetch.Config.Time: 0 (0)
>   - CatalogFetch.DatabaseList.Hits: 8 (8)
>   - CatalogFetch.DatabaseList.Requests: 8 (8)
>   - CatalogFetch.DatabaseList.Time: 0 (0)
>   - CatalogFetch.PartitionLists.Misses: 1 (1)
>   - CatalogFetch.PartitionLists.Requests: 1 (1)
>   - CatalogFetch.PartitionLists.Time: 7 (7)
>   - CatalogFetch.Partitions.Hits: 1837 (1837)
>   - CatalogFetch.Partitions.Misses: 1837 (1837)
>   - CatalogFetch.Partitions.Requests: 3674 (3674)
>   - CatalogFetch.Partitions.Time: 325 (325)
>   - CatalogFetch.RPCs.Bytes: 4.7 MiB (4936030)
>   - CatalogFetch.RPCs.Requests: 22 (22)
>   - CatalogFetch.RPCs.Time: 343 (343)
>   - CatalogFetch.TableNames.Hits: 4 (4)
>   - CatalogFetch.TableNames.Misses: 4 (4)
>   - CatalogFetch.TableNames.Requests: 8 (8)
>   - CatalogFetch.TableNames.Time: 0 (0)
>   - CatalogFetch.Tables.Misses: 8 (8)
>   - CatalogFetch.Tables.Requests: 8 (8)
>   - CatalogFetch.Tables.Time: 74 (74)
>   - InactiveTotalTime: 0ns (0)
>   - TotalTime: 0ns (0)
>   ImpalaServer
> - CatalogOpExecTimer: 1.97s (1972007962)
> - ClientFetchWaitTimer: 0ns (0)
> - InactiveTotalTime: 0ns (0)
> - RowMaterializationTimer: 0ns (0)
> - TotalTime: 0ns (0)
>   Child Queries
> Table Stats Query (id=db4821e4aa5bb04d:d4a5ae45)
> Column Stats Query (id=0444367557e3496d:f9435111)
> {noformat}



--
This message was sent by Atlassian Jira

[jira] [Resolved] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables

2019-08-28 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8889.
-
Resolution: Fixed

> Incorrect exception message when trying unsupported option for acid tables
> --
>
> Key: IMPALA-8889
> URL: https://issues.apache.org/jira/browse/IMPALA-8889
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Critical
>
> when we try unsupported option say alter table on acid tables from , it thows 
> an exception which is expected but it gives a wrong message :
>  It says we only support Read for insert-only tables which is not true 
> anymore, since we also support insert, drop ( and soon truncate) also now.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables

2019-08-28 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8889.
-
Resolution: Fixed

> Incorrect exception message when trying unsupported option for acid tables
> --
>
> Key: IMPALA-8889
> URL: https://issues.apache.org/jira/browse/IMPALA-8889
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Critical
>
> when we try unsupported option say alter table on acid tables from , it thows 
> an exception which is expected but it gives a wrong message :
>  It says we only support Read for insert-only tables which is not true 
> anymore, since we also support insert, drop ( and soon truncate) also now.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-27 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8793.
-
Resolution: Fixed

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-27 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8793.
-
Resolution: Fixed

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-26 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8793:
---

Assignee: Zoltán Borók-Nagy

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-26 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8793:
---

Assignee: (was: Dinesh Garg)

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-26 Thread Dinesh Garg (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916393#comment-16916393
 ] 

Dinesh Garg commented on IMPALA-8793:
-

[https://gerrit.cloudera.org/c/14071/]

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Dinesh Garg
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-26 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8793 started by Dinesh Garg.
---
> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Dinesh Garg
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad

2019-08-23 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7506:
---

Assignee: Quanlong Huang

> Support global INVALIDATE METADATA on fetch-on-demand impalad
> -
>
> Key: IMPALA-7506
> URL: https://issues.apache.org/jira/browse/IMPALA-7506
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-v2
>
> There is some complexity with how this is implemented in the original code: 
> it depends on maintaining the minimum version of any object in the impalad's 
> local cache. We can't determine that in an on-demand impalad, so INVALIDATE 
> METADATA is not supported currently on "fetch-on-demand".



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8875) TestHmsIntegration.test_drop_column_maintains_stats seems flaky

2019-08-22 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8875:

Labels: broken-build impala-stats  (was: broken-build)

> TestHmsIntegration.test_drop_column_maintains_stats seems flaky
> ---
>
> Key: IMPALA-8875
> URL: https://issues.apache.org/jira/browse/IMPALA-8875
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Gabor Kaszab
>Priority: Blocker
>  Labels: broken-build, impala-stats
>
> The test of TestHmsIntegration.test_drop_column_maintains_stats seems flaky. 
> The related test file was updated recently due to 
> https://issues.apache.org/jira/browse/IMPALA-8823. Create this JIRA to track 
> this failed test. Maybe [~gaborkaszab] you could take a brief look at this? 
> Thanks!
> The error messages are provided in the following.
> {code:java}
> Error Message 
> assert {'avg_col_len...ializer', ...} == {'COLUMN_STATS...me': 'x', ...} 
> Common items: {'avg_col_len': '', 'bitVector': '', 'col_name': 'x', 
> 'comment': 'from deserializer', 'data_type': 'int', 'distinct_count': '0', 
> 'max': '0', 'max_col_len': '', 'min': '0', 'num_falses': '', 'num_nulls': 
> '0', 'num_trues': ''} Right contains more items: {'COLUMN_STATS_ACCURATE': 
> '{}'} Full diff: + {'COLUMN_STATS_ACCURATE': '{}', - {'avg_col_len': '', ? ^ 
> + 'avg_col_len': '', ? ^ 'bitVector': '', 'col_name': 'x', 'comment': 'from 
> deserializer', 'data_type': 'int', 'distinct_count': '0', 'max': '0', 
> 'max_col_len': '', 'min': '0', 'num_falses': '', 'num_nulls': '0', 
> 'num_trues': ''}
> {code}
> The stack trace is given as follows.
> {code:java}
> Stacktrace
> metadata/test_hms_integration.py:390: in test_drop_column_maintains_stats
> assert hive_x_stats == self.hive_column_stats(table_name, 'x')
> E   assert {'avg_col_len...ializer', ...} == {'COLUMN_STATS...me': 'x', ...}
> E Common items:
> E {'avg_col_len': '',
> E  'bitVector': '',
> E  'col_name': 'x',
> E  'comment': 'from deserializer',
> E  'data_type': 'int',
> E  'distinct_count': '0',
> E  'max': '0',
> E  'max_col_len': '',
> E  'min': '0',
> E  'num_falses': '',
> E  'num_nulls': '0',
> E  'num_trues': ''}
> E Right contains more items:
> E {'COLUMN_STATS_ACCURATE': '{}'}
> E Full diff:
> E + {'COLUMN_STATS_ACCURATE': '{}',
> E - {'avg_col_len': '',
> E ? ^
> E +  'avg_col_len': '',
> E ? ^
> E 'bitVector': '',
> E 'col_name': 'x',
> E 'comment': 'from deserializer',
> E 'data_type': 'int',
> E 'distinct_count': '0',
> E 'max': '0',
> E 'max_col_len': '',
> E 'min': '0',
> E 'num_falses': '',
> E 'num_nulls': '0',
> E 'num_trues': ''}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8572) Move query hook execution to before query unregistration

2019-08-20 Thread Dinesh Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8572:
---

Assignee: bharath v  (was: radford nguyen)

> Move query hook execution to before query unregistration
> 
>
> Key: IMPALA-8572
> URL: https://issues.apache.org/jira/browse/IMPALA-8572
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: radford nguyen
>Assignee: bharath v
>Priority: Critical
>
> The backend currently executes query event hooks during 
> {{ImpalaServer::UnregisterQuery}}, which may actually only happen a long time 
> after the query has actually executed. We depend on either the client closing 
> the query/session, the client's connection dropping, or an idle session 
> timing out.
> e.g. the following sequence is possible.
>  # User executes query from Hue.
>  # User goes home for weekend, leaving Hue tab open in browser
>  # If we're lucky, the session timeout expires after some amount of idle time.
>  # The query gets unregistered, hooks get executed
> It would generally be desirable to move the lineage logger earlier in the 
> query lifecycle, so it occurs as soon as all of the required data is 
> available.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8572) Move query hook execution to before query unregistration

2019-08-15 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8572:

Priority: Critical  (was: Major)

> Move query hook execution to before query unregistration
> 
>
> Key: IMPALA-8572
> URL: https://issues.apache.org/jira/browse/IMPALA-8572
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: radford nguyen
>Assignee: radford nguyen
>Priority: Critical
>
> The backend currently executes query event hooks during 
> {{ImpalaServer::UnregisterQuery}}, which may actually only happen a long time 
> after the query has actually executed. We depend on either the client closing 
> the query/session, the client's connection dropping, or an idle session 
> timing out.
> e.g. the following sequence is possible.
>  # User executes query from Hue.
>  # User goes home for weekend, leaving Hue tab open in browser
>  # If we're lucky, the session timeout expires after some amount of idle time.
>  # The query gets unregistered, hooks get executed
> It would generally be desirable to move the lineage logger earlier in the 
> query lifecycle, so it occurs as soon as all of the required data is 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8572) Move query hook execution to before query unregistration

2019-08-15 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8572:
---

Assignee: radford nguyen

> Move query hook execution to before query unregistration
> 
>
> Key: IMPALA-8572
> URL: https://issues.apache.org/jira/browse/IMPALA-8572
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: radford nguyen
>Assignee: radford nguyen
>Priority: Major
>
> The backend currently executes query event hooks during 
> {{ImpalaServer::UnregisterQuery}}, which may actually only happen a long time 
> after the query has actually executed. We depend on either the client closing 
> the query/session, the client's connection dropping, or an idle session 
> timing out.
> e.g. the following sequence is possible.
>  # User executes query from Hue.
>  # User goes home for weekend, leaving Hue tab open in browser
>  # If we're lucky, the session timeout expires after some amount of idle time.
>  # The query gets unregistered, hooks get executed
> It would generally be desirable to move the lineage logger earlier in the 
> query lifecycle, so it occurs as soon as all of the required data is 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8823) Implement DROP TABLE for insert-only ACID tables

2019-08-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8823.
-
Resolution: Fixed

> Implement DROP TABLE for insert-only ACID tables
> 
>
> Key: IMPALA-8823
> URL: https://issues.apache.org/jira/browse/IMPALA-8823
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot drop insert-only ACID tables.
> To implement DROP TABLE for insert-only tables at first we need to acquire an 
> exclusive lock from HMS, then proceed with the usual DROP TABLE process.
> Heartbeating the lock might be also needed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8823) Implement DROP TABLE for insert-only ACID tables

2019-08-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8823.
-
Resolution: Fixed

> Implement DROP TABLE for insert-only ACID tables
> 
>
> Key: IMPALA-8823
> URL: https://issues.apache.org/jira/browse/IMPALA-8823
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot drop insert-only ACID tables.
> To implement DROP TABLE for insert-only tables at first we need to acquire an 
> exclusive lock from HMS, then proceed with the usual DROP TABLE process.
> Heartbeating the lock might be also needed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IMPALA-8717) impala-shell support for HiveServer2 HTTP endpoint

2019-07-25 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8717:

Priority: Critical  (was: Major)

> impala-shell support for HiveServer2 HTTP endpoint
> --
>
> Key: IMPALA-8717
> URL: https://issues.apache.org/jira/browse/IMPALA-8717
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Affects Versions: Impala 3.3.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
>
> Having impala-shell support to connect to the HTTP HS2 endpoints should be 
> super helpful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8636) Implement INSERT for insert-only ACID tables

2019-07-25 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8636:

Priority: Critical  (was: Major)

> Implement INSERT for insert-only ACID tables
> 
>
> Key: IMPALA-8636
> URL: https://issues.apache.org/jira/browse/IMPALA-8636
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: impala-acid
>
> Impala should support insertion for insert-only ACID tables.
> For this we need to allocate a write ID for the target table, and write the 
> data into the base/delta directories.
> INSERT operation should create a new delta directory with the allocated write 
> ID.
> INSERT OVERWRITE should create a new base directory with the allocated write 
> ID. This new base directory will only contain the data coming from this 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8725) Improve usability when HMS is configured with strict managed tables

2019-07-12 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8725:

Priority: Critical  (was: Major)

> Improve usability when HMS is configured with strict managed tables
> ---
>
> Key: IMPALA-8725
> URL: https://issues.apache.org/jira/browse/IMPALA-8725
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Anurag Mantripragada
>Priority: Critical
>
> Users tend to create and query managed tables often and when HMS is 
> configured with strict managed tables they get: 
> {code:java}
> Table default.foo failed strict managed table checks due to the following 
> reason: Table is marked as a managed table but is not transactional{code}
> We should improve usability in these scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog

2019-07-01 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8486:
---

Assignee: (was: Todd Lipcon)

> test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
> -
>
> Key: IMPALA-8486
> URL: https://issues.apache.org/jira/browse/IMPALA-8486
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Priority: Critical
>  Labels: catalog-v2
>
> {noformat}
>  TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
> self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
> tests/query_test/test_udfs.py:52: in _run_query_all_impalads
> assert result.data == expected
> E   assert ['Old UDF'] == ['New UDF']
> E At index 0 diff: 'Old UDF' != 'New UDF'
> E Full diff:
> E - ['Old UDF']
> E + ['New UDF']
> 
> {noformat}
> The tests are checking that the local UDF caches on each impalad get 
> invalidated by a drop/create of a function referencing the HDFS file 
> containing the UDF. The test fails because the local catalog, unlike the 
> regular catalog, doesn't invalidate LibCache entries upon receiving a catalog 
> update.
> I looked at this for long enough to realise that the invalidation mechanism 
> is fundamentally broken - it doesn't work with dedicated executors. It also 
> creates a race between the statestore updates and queries referencing the 
> UDFs - if the queries win the race, then they can incorrectly use the old 
> version that should have been invalidated.
> I think this is a potentially problematic issue because old JAR/SO versions 
> could persist in the cache indefinitely if old versions are overwritten in 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad

2019-07-01 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7506:
---

Assignee: (was: Todd Lipcon)

> Support global INVALIDATE METADATA on fetch-on-demand impalad
> -
>
> Key: IMPALA-7506
> URL: https://issues.apache.org/jira/browse/IMPALA-7506
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: catalog-v2
>
> There is some complexity with how this is implemented in the original code: 
> it depends on maintaining the minimum version of any object in the impalad's 
> local cache. We can't determine that in an on-demand impalad, so INVALIDATE 
> METADATA is not supported currently on "fetch-on-demand".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7615) Partition metadata mismatch should be handled gracefully in local catalog mode.

2019-07-01 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7615:
---

Assignee: bharath v

> Partition metadata mismatch should be handled gracefully in local catalog 
> mode.
> ---
>
> Key: IMPALA-7615
> URL: https://issues.apache.org/jira/browse/IMPALA-7615
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Major
>  Labels: catalog-v2
>
> *This is a Catalog v2 only improvement*
> An RPC to fetch partition metadata for a partition ID that does not exist on 
> the Catalog server currently throws IAE.
> {noformat}
> @Override
>   public TGetPartialCatalogObjectResponse getPartialInfo(
>   TGetPartialCatalogObjectRequest req) throws TableLoadingException {
>   for (long partId : partIds) {
> HdfsPartition part = partitionMap_.get(partId);
> Preconditions.checkArgument(part != null, "Partition id %s does not 
> exist",  <--
> partId);
> TPartialPartitionInfo partInfo = new TPartialPartitionInfo(partId);
> if (req.table_info_selector.want_partition_names) {
>   partInfo.setName(part.getPartitionName());
> }
> if (req.table_info_selector.want_partition_metadata) {
>   partInfo.hms_partition = part.toHmsPartition();
> {noformat}
> This is undesirable since such exceptions are not transparently retried in 
> the frontend. Instead we should fix this code path to throw 
> InconsistentMetadataException, similar to what we do for other code paths 
> that handle such inconsistent metadata like version changes.
> An example stack trace that hits this issue looks like follows,
> {noformat}
> org.apache.impala.catalog.local.LocalCatalogException: Could not load 
> partitions for table partition_level_tests.store_sales
> at 
> org.apache.impala.catalog.local.LocalFsTable.loadPartitions(LocalFsTable.java:399)
> at 
> org.apache.impala.catalog.FeCatalogUtils.loadAllPartitions(FeCatalogUtils.java:207)
> at 
> org.apache.impala.catalog.local.LocalFsTable.getMajorityFormat(LocalFsTable.java:244)
> at 
> org.apache.impala.planner.HdfsTableSink.computeResourceProfile(HdfsTableSink.java:75)
> at 
> org.apache.impala.planner.PlanFragment.computeResourceProfile(PlanFragment.java:233)
> at org.apache.impala.planner.Planner.computeResourceReqs(Planner.java:365)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1020)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1162)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1077)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
> Caused by: org.apache.thrift.TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[IllegalArgumentException: Partition id 10084 does not exist]), 
> lookup_status:OK)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:322)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadPartitionsFromCatalogd(CatalogdMetaProvider.java:644)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadPartitionsByRefs(CatalogdMetaProvider.java:610)
> at 
> org.apache.impala.catalog.local.LocalFsTable.loadPartitions(LocalFsTable.java:395)
> ... 9 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7864) TestLocalCatalogRetries::test_replan_limit is flaky

2019-07-01 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7864:
---

Assignee: bharath v  (was: Vihang Karajgaonkar)

> TestLocalCatalogRetries::test_replan_limit is flaky
> ---
>
> Key: IMPALA-7864
> URL: https://issues.apache.org/jira/browse/IMPALA-7864
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
> Environment: Ubuntu 16.04
>Reporter: Jim Apple
>Assignee: bharath v
>Priority: Critical
>  Labels: broken-build, catalog-v2, flaky
> Fix For: Impala 3.2.0
>
>
> In https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3605/, 
> TestLocalCatalogRetries::test_replan_limit failed on an unrelated patch. On 
> my development machine, the test passed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8648) Impala ACID read stress tests

2019-06-27 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8648:
---

Assignee: Csaba Ringhofer  (was: Todd Lipcon)

> Impala ACID read stress tests
> -
>
> Key: IMPALA-8648
> URL: https://issues.apache.org/jira/browse/IMPALA-8648
> Project: IMPALA
>  Issue Type: Test
>Reporter: Dinesh Garg
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8369) Impala should be able to interoperate with Hive 3.1.0

2019-06-26 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8369:
---

Assignee: Vihang Karajgaonkar  (was: Dinesh Garg)

> Impala should be able to interoperate with Hive 3.1.0
> -
>
> Key: IMPALA-8369
> URL: https://issues.apache.org/jira/browse/IMPALA-8369
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: impala-acid
>
> Currently, Impala only works with Hive 2.1.1. Since Hive 3.1.0 has been 
> released for a while it would be good to add support for Hive 3.1.0 (HMS 
> 3.1.0). This patch will focus on ability to connect to HMS 3.1.0 and run 
> existing tests. It will not focus on adding support for newer features like 
> ACID in Hive 3.1.0 which can be taken up as separate JIRA.
> It would be good to make changes to Impala source code such that it can work 
> with both Hive 2.1.0 and Hive 3.1.0 without the need to create a separate 
> branch. However, this should be a aspirational goal. If we hit a blocker we 
> should investigate alternative approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

2019-06-26 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8663:

Labels: impala-acid  (was: )

> FileMetadataLoader should skip listing files in hidden and tmp directories
> --
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: impala-acid
>
> Currently, the file metadata loader recursively lists the table and partition 
> directories to get the fileStatuses. For each filestatus we ignore the hidden 
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not 
> sufficient. For instance, if Hive is inserting data into a table when the 
> refresh is called, it is possible the staging directory is present within the 
> table directory. This staging directory is a hidden directory of the naming 
> {{.hive-staging_*}}. It is possible that this directory has files which are 
> not hidden (starting from a . or _). Such files should be considered 
> temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which 
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table 
> directory. This file should also be skipped and not considered as a valid 
> data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

2019-06-26 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8663:

Priority: Critical  (was: Major)

> FileMetadataLoader should skip listing files in hidden and tmp directories
> --
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: impala-acid
>
> Currently, the file metadata loader recursively lists the table and partition 
> directories to get the fileStatuses. For each filestatus we ignore the hidden 
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not 
> sufficient. For instance, if Hive is inserting data into a table when the 
> refresh is called, it is possible the staging directory is present within the 
> table directory. This staging directory is a hidden directory of the naming 
> {{.hive-staging_*}}. It is possible that this directory has files which are 
> not hidden (starting from a . or _). Such files should be considered 
> temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which 
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table 
> directory. This file should also be skipped and not considered as a valid 
> data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

2019-06-18 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8339:

Priority: Critical  (was: Major)

> Coordinator should be more resilient to fragment instances startup failure
> --
>
> Key: IMPALA-8339
> URL: https://issues.apache.org/jira/browse/IMPALA-8339
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
>  Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8663:
---

Assignee: Vihang Karajgaonkar  (was: Dinesh Garg)

> FileMetadataLoader should skip listing files in hidden and tmp directories
> --
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Currently, the file metadata loader recursively lists the table and partition 
> directories to get the fileStatuses. For each filestatus we ignore the hidden 
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not 
> sufficient. For instance, if Hive is inserting data into a table when the 
> refresh is called, it is possible the staging directory is present within the 
> table directory. This staging directory is a hidden directory of the naming 
> {{.hive-staging_*}}. It is possible that this directory has files which are 
> not hidden (starting from a . or _). Such files should be considered 
> temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which 
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table 
> directory. This file should also be skipped and not considered as a valid 
> data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8663 started by Dinesh Garg.
---
> FileMetadataLoader should skip listing files in hidden and tmp directories
> --
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Dinesh Garg
>Priority: Major
>
> Currently, the file metadata loader recursively lists the table and partition 
> directories to get the fileStatuses. For each filestatus we ignore the hidden 
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not 
> sufficient. For instance, if Hive is inserting data into a table when the 
> refresh is called, it is possible the staging directory is present within the 
> table directory. This staging directory is a hidden directory of the naming 
> {{.hive-staging_*}}. It is possible that this directory has files which are 
> not hidden (starting from a . or _). Such files should be considered 
> temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which 
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table 
> directory. This file should also be skipped and not considered as a valid 
> data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8436) Disallow write/alter to materialized views

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8436.
-
Resolution: Fixed

> Disallow write/alter to materialized views
> --
>
> Key: IMPALA-8436
> URL: https://issues.apache.org/jira/browse/IMPALA-8436
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Sudhanshu Arora
>Priority: Critical
>  Labels: impala-acid
>
> Block write/alter into materialized views, but allow select



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8436) Disallow write/alter to materialized views

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8436.
-
Resolution: Fixed

> Disallow write/alter to materialized views
> --
>
> Key: IMPALA-8436
> URL: https://issues.apache.org/jira/browse/IMPALA-8436
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Sudhanshu Arora
>Priority: Critical
>  Labels: impala-acid
>
> Block write/alter into materialized views, but allow select



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-8585) Impala ACID tests

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8585:
---

Assignee: Csaba Ringhofer  (was: Dinesh Garg)

> Impala ACID tests
> -
>
> Key: IMPALA-8585
> URL: https://issues.apache.org/jira/browse/IMPALA-8585
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>
> Umbrella Jira for adding tests about ACID functionality, e.g.:
>  * Ordinary table that was upgraded to ACID table
>  * Inserting data in hive and querying it in Impala concurrently
>  * Compute stats interoperability between Hive and Impala
>  * Partitioned tables, dynamic partitioning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8585) Impala ACID tests

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8585 started by Dinesh Garg.
---
> Impala ACID tests
> -
>
> Key: IMPALA-8585
> URL: https://issues.apache.org/jira/browse/IMPALA-8585
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Dinesh Garg
>Priority: Critical
>  Labels: impala-acid
>
> Umbrella Jira for adding tests about ACID functionality, e.g.:
>  * Ordinary table that was upgraded to ACID table
>  * Inserting data in hive and querying it in Impala concurrently
>  * Compute stats interoperability between Hive and Impala
>  * Partitioned tables, dynamic partitioning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8439) Add Hive ACID tables during dataload if Hive 3.1 is enabled

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8439:
---

Assignee: Yongzhi Chen  (was: Dinesh Garg)

> Add Hive ACID tables during dataload if Hive 3.1 is enabled
> ---
>
> Key: IMPALA-8439
> URL: https://issues.apache.org/jira/browse/IMPALA-8439
> Project: IMPALA
>  Issue Type: Story
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Yongzhi Chen
>Priority: Critical
>  Labels: impala-acid
>
> Test warehouse should include a few transactional tables (insert-only, not 
> insert-only, partitioned, not partitioned, bucketed, not-bucketed, compacted 
> and uncompacted) to enable the testing of ACID features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8439) Add Hive ACID tables during dataload if Hive 3.1 is enabled

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8439 started by Dinesh Garg.
---
> Add Hive ACID tables during dataload if Hive 3.1 is enabled
> ---
>
> Key: IMPALA-8439
> URL: https://issues.apache.org/jira/browse/IMPALA-8439
> Project: IMPALA
>  Issue Type: Story
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Dinesh Garg
>Priority: Critical
>  Labels: impala-acid
>
> Test warehouse should include a few transactional tables (insert-only, not 
> insert-only, partitioned, not partitioned, bucketed, not-bucketed, compacted 
> and uncompacted) to enable the testing of ACID features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-8439) Add Hive ACID tables during dataload if Hive 3.1 is enabled

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reopened IMPALA-8439:
-

> Add Hive ACID tables during dataload if Hive 3.1 is enabled
> ---
>
> Key: IMPALA-8439
> URL: https://issues.apache.org/jira/browse/IMPALA-8439
> Project: IMPALA
>  Issue Type: Story
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>
> Test warehouse should include a few transactional tables (insert-only, not 
> insert-only, partitioned, not partitioned, bucketed, not-bucketed, compacted 
> and uncompacted) to enable the testing of ACID features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8600) Reload partition does not work for transactional tables

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg updated IMPALA-8600:

Labels: impala-acid  (was: )

> Reload partition does not work for transactional tables
> ---
>
> Key: IMPALA-8600
> URL: https://issues.apache.org/jira/browse/IMPALA-8600
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: impala-acid
>
> If a table is transactional, a reload partition call should fetch the valid 
> writeIds. Without doing this, the reload will skip adding all the newly 
> created delta files of the transactional table pertaining to the new writeIds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8631) Ensure that cached data is always up to date to avoid reads based on stale metadata for transactional read only tables

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8631:
---

Assignee: Todd Lipcon

> Ensure that cached data is always up to date to avoid reads based on stale 
> metadata for transactional read only tables 
> ---
>
> Key: IMPALA-8631
> URL: https://issues.apache.org/jira/browse/IMPALA-8631
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Dinesh Garg
>Assignee: Todd Lipcon
>Priority: Major
>  Labels: impala-acid
>
> Acquire latest validWriteIdList in the coordinator and validate that the 
> cached data is up to date. Automatically force refresh with query if it’s not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8369) Impala should be able to interoperate with Hive 3.1.0

2019-06-13 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8369 started by Dinesh Garg.
---
> Impala should be able to interoperate with Hive 3.1.0
> -
>
> Key: IMPALA-8369
> URL: https://issues.apache.org/jira/browse/IMPALA-8369
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Dinesh Garg
>Priority: Major
>  Labels: impala-acid
>
> Currently, Impala only works with Hive 2.1.1. Since Hive 3.1.0 has been 
> released for a while it would be good to add support for Hive 3.1.0 (HMS 
> 3.1.0). This patch will focus on ability to connect to HMS 3.1.0 and run 
> existing tests. It will not focus on adding support for newer features like 
> ACID in Hive 3.1.0 which can be taken up as separate JIRA.
> It would be good to make changes to Impala source code such that it can work 
> with both Hive 2.1.0 and Hive 3.1.0 without the need to create a separate 
> branch. However, this should be a aspirational goal. If we hit a blocker we 
> should investigate alternative approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8439) Add Hive ACID tables during dataload if Hive 3.1 is enabled

2019-06-12 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg resolved IMPALA-8439.
-
Resolution: Fixed

> Add Hive ACID tables during dataload if Hive 3.1 is enabled
> ---
>
> Key: IMPALA-8439
> URL: https://issues.apache.org/jira/browse/IMPALA-8439
> Project: IMPALA
>  Issue Type: Story
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>
> Test warehouse should include a few transactional tables (insert-only, not 
> insert-only, partitioned, not partitioned, bucketed, not-bucketed, compacted 
> and uncompacted) to enable the testing of ACID features.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-8440) Add "post upgrade" ACID tables to test data

2019-06-12 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8440:
---

Assignee: Csaba Ringhofer

> Add "post upgrade" ACID tables to test data
> ---
>
> Key: IMPALA-8440
> URL: https://issues.apache.org/jira/browse/IMPALA-8440
> Project: IMPALA
>  Issue Type: Story
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> Include a transactional table in the test data which is in post-upgrade 
> format (what an old table looks like after it becomes transactional).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8636) Implement INSERT for insert-only ACID tables

2019-06-12 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8636:
---

Assignee: Zoltán Borók-Nagy  (was: Dinesh Garg)

> Implement INSERT for insert-only ACID tables
> 
>
> Key: IMPALA-8636
> URL: https://issues.apache.org/jira/browse/IMPALA-8636
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala should support insertion for insert-only ACID tables.
> For this we need to allocate a write ID for the target table, and write the 
> data into the base/delta directories.
> INSERT operation should create a new delta directory with the allocated write 
> ID.
> INSERT OVERWRITE should create a new base directory with the allocated write 
> ID. This new base directory will only contain the data coming from this 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8637) Implement transaction handling and locking for ACID queries

2019-06-12 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8637:
---

Assignee: Zoltán Borók-Nagy  (was: Dinesh Garg)

> Implement transaction handling and locking for ACID queries
> ---
>
> Key: IMPALA-8637
> URL: https://issues.apache.org/jira/browse/IMPALA-8637
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> * Start a transaction before planning
>  * lock tables
>  * heartbeat during execution
>  * mark committed after execution finishes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   >