[jira] [Commented] (IMPALA-8778) Support read/write Apache Hudi tables

2019-08-01 Thread Yuanbin Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898457#comment-16898457
 ] 

Yuanbin Cheng commented on IMPALA-8778:
---

[~vinoth]

I have read the code in the Apache Impala that related to the HdfsTable. For 
now, because Hudi partitioning is compatible with Hive partitioning.

So currently, my thought is changing the partition loading part of the coed in 
Apache Impala. It is the loadFileMetadataForPartitions method in the HdfsTable 
class.

This method group the path of partitions and for every path create a 
`FileMetadataLoader` and then parallel call the load method.

Here is the load method in the FileMetadataLoader

[https://github.com/apache/impala/blob/9ee4a5e1940afa47227a92e0f6fba6d4c9909f63/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L129]

Since the Impala didn't use the InputFormat classes as Hive, I think I need to 
modify this load partition method to teach the Impala how to load the Hudi 
table.

Do you have any idea about how to load the latest version of the Hudi dataset 
without using the InputFormat as Hive, or any related code about the Hive 
metadata in Hudi may help a lot? 

Another thing is that I have created a draft change in Impala's Gerrit.

[https://gerrit.cloudera.org/#/c/13948/]

Current I just add the `HoodieInputFormat` as a VALID_INPUT_FORMAT which will 
make the Impala read the Hudi as the regular Parquet table.

I am struggling to add some tests in the Impala to verify that this change can 
actually make the Impala successfully read the Hudi data, it seems that I need 
to add Hudi dependencies in the test set and set some data for testing. 

> Support read/write Apache Hudi tables
> -
>
> Key: IMPALA-8778
> URL: https://issues.apache.org/jira/browse/IMPALA-8778
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Yuanbin Cheng
>Assignee: Yuanbin Cheng
>Priority: Major
>
> Apache Impala currently not support Apache Hudi, cannot even pull metadata 
> from Hive.
> Related issue: 
> [https://github.com/apache/incubator-hudi/issues/179] 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8822) No metadata loading information in query profile when Catalog V2 enabled

2019-08-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898455#comment-16898455
 ] 

Sahil Takiar commented on IMPALA-8822:
--

Thanks Vihang, that helps a lot. Marking this as dependent on 
[IMPALA-8627|http://issues.cloudera.org/browse/IMPALA-8627].

> No metadata loading information in query profile when Catalog V2 enabled
> 
>
> Key: IMPALA-8822
> URL: https://issues.apache.org/jira/browse/IMPALA-8822
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> When local catalog is enabled, we can no longer find table loading 
> information in query profile even just after invalidate metadata for the 
> tables.
> In Catalog V1, you can find the table loading information in query profile 
> like following:
> Query Compilation: 4s401ms
>   - Metadata load started: 661.084us (661.084us)
>   - Metadata load finished. loaded-tables=1/1 load-requests=1 
> catalog-updates=3: 3s819ms (3s819ms)
>  - Analysis finished: 3s820ms (763.979us)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8822) No metadata loading information in query profile when Catalog V2 enabled

2019-08-01 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898418#comment-16898418
 ] 

Vihang Karajgaonkar commented on IMPALA-8822:
-

Yeah, in catalog-v2 the events on the query profile are different than on 
catalog-v2 so that test needs to be modified to look for different events. 
https://gerrit.cloudera.org/#/c/13933/ should fix that.

> No metadata loading information in query profile when Catalog V2 enabled
> 
>
> Key: IMPALA-8822
> URL: https://issues.apache.org/jira/browse/IMPALA-8822
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> When local catalog is enabled, we can no longer find table loading 
> information in query profile even just after invalidate metadata for the 
> tables.
> In Catalog V1, you can find the table loading information in query profile 
> like following:
> Query Compilation: 4s401ms
>   - Metadata load started: 661.084us (661.084us)
>   - Metadata load finished. loaded-tables=1/1 load-requests=1 
> catalog-updates=3: 3s819ms (3s819ms)
>  - Analysis finished: 3s820ms (763.979us)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7374) Impala Doc: Doc DATE type

2019-08-01 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7374 started by Alex Rodoni.
---
> Impala Doc: Doc DATE type
> -
>
> Key: IMPALA-7374
> URL: https://issues.apache.org/jira/browse/IMPALA-7374
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8376) Add per-directory limits for scratch disk usage

2019-08-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8376:
-

Assignee: Tim Armstrong

> Add per-directory limits for scratch disk usage
> ---
>
> Key: IMPALA-8376
> URL: https://issues.apache.org/jira/browse/IMPALA-8376
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> The current syntax is:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad,/data/10/impala/impalad,/data/11/impala/impalad,/data/2/impala/impalad,/data/3/impala/impalad,/data/4/impala/impalad,/data/5/impala/impalad,/data/6/impala/impalad,/data/7/impala/impalad,/data/8/impala/impalad,/data/9/impala/impalad,/data/12/impala/impalad
> {noformat}
> The current syntax for the data cache is
> {noformat}
> --data_cache_dir=/tmp --data_cache_size=500MB
> {noformat}
> One idea is to allow optionally specifying the limit after each directory:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad:500MB,/data/10/impala/impalad:2GB,/data/11/impala/impalad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8403) Possible thread leak in impalad

2019-08-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898339#comment-16898339
 ] 

Michael Ho edited comment on IMPALA-8403 at 8/1/19 8:28 PM:


This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of Thrift connection 
threads for backend service grows.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.


was (Author: kwho):
This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of backend threads grow.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8403) Possible thread leak in impalad

2019-08-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898339#comment-16898339
 ] 

Michael Ho commented on IMPALA-8403:


This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of backend threads grow.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8738) Add a column representing the type(table or view) in the show tables output

2019-08-01 Thread Thomas Tauber-Marshall (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898336#comment-16898336
 ] 

Thomas Tauber-Marshall commented on IMPALA-8738:


Great. We probably don't want to change our existing 'show tables' to not show 
views, since users may rely on the existing behavior. Adding a 'show views' 
that only shows views would be reasonable, though.

Also sounds like your original suggestion of 'show extended tables' makes sense.

> Add a column representing the type(table or view) in the show tables output
> ---
>
> Key: IMPALA-8738
> URL: https://issues.apache.org/jira/browse/IMPALA-8738
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog, Frontend
>Reporter: baotuquan
>Assignee: baotuquan
>Priority: Minor
>  Labels: features
>
> Now the output of the +*show tables*+ command in the system is as follows:
> {code:java}
> default> show tables;
> Query: show tables
> +--+
> | name |
> +--+
> | table1 |
> | view1 |
> +--+
> {code}
> I think we should add a column for the representation type,The output should 
> be like this
> {code:java}
> default> show tables;
> Query: show tables
> +---+
> | name | type |
> ++
> | table1 |  table   |
> | view1 |   view   |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

2019-08-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8780.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

Closing as fixed. We ended up referring the complete re-factoring of 
IMPALA-8779 to a later patch.

> Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows 
> are fetched
> -
>
> Key: IMPALA-8780
> URL: https://issues.apache.org/jira/browse/IMPALA-8780
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all 
> rows are fetched. The implementation should use the {{RowBatchQueue}} 
> introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator 
> fragments will be closed if all results fit in the {{RowBatchQueue}}. 
> {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto 
> the queue and then return. If the queue is full, it should block until there 
> is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from 
> the queue and then fills in the given {{QueryResultSet}} by using the 
> {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call 
> {{BufferedPlanRootSink::Close}} while the consumer is calling 
> {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so 
> that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is 
> running.
> The implementation of {{BufferedPlanRootSink}} should remain the same 
> regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a 
> {{BufferedTupleStream}} backed {{RowBatchQueue}} is used.
> {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the 
> sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries 
> that return under 1024 rows, all non-coordinator fragments are closed 
> immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows 
> buffering for 1+ {{RowBatch}}-es.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

2019-08-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8780.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

Closing as fixed. We ended up referring the complete re-factoring of 
IMPALA-8779 to a later patch.

> Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows 
> are fetched
> -
>
> Key: IMPALA-8780
> URL: https://issues.apache.org/jira/browse/IMPALA-8780
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all 
> rows are fetched. The implementation should use the {{RowBatchQueue}} 
> introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator 
> fragments will be closed if all results fit in the {{RowBatchQueue}}. 
> {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto 
> the queue and then return. If the queue is full, it should block until there 
> is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from 
> the queue and then fills in the given {{QueryResultSet}} by using the 
> {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call 
> {{BufferedPlanRootSink::Close}} while the consumer is calling 
> {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so 
> that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is 
> running.
> The implementation of {{BufferedPlanRootSink}} should remain the same 
> regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a 
> {{BufferedTupleStream}} backed {{RowBatchQueue}} is used.
> {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the 
> sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries 
> that return under 1024 rows, all non-coordinator fragments are closed 
> immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows 
> buffering for 1+ {{RowBatch}}-es.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IMPALA-8812) Impala Doc: SPLIT_PART to support negative indexes

2019-08-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898333#comment-16898333
 ] 

ASF subversion and git services commented on IMPALA-8812:
-

Commit e8bd307941f8734d24b5e3ce61e2b319f59563c5 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8bd307 ]

IMPALA-8812: [DOCS] Negative index support in SPLIT_PART function

Change-Id: I1b1810d317167fae5e0b050dfd6a7dd7a7762bb3
Reviewed-on: http://gerrit.cloudera.org:8080/13970
Tested-by: Impala Public Jenkins 
Reviewed-by: Norbert Luksa 
Reviewed-by: Alex Rodoni 


> Impala Doc: SPLIT_PART to support negative indexes
> --
>
> Key: IMPALA-8812
> URL: https://issues.apache.org/jira/browse/IMPALA-8812
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8600) Reload partition does not work for transactional tables

2019-08-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898332#comment-16898332
 ] 

ASF subversion and git services commented on IMPALA-8600:
-

Commit 2d819655118c8c6e82649e3c3821311f3dd01174 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2d81965 ]

IMPALA-8600: Refresh transactional tables

Refreshing a subset of partitions in a transactional table might lead
us to an inconsistent state of that transactional table. As a fix
user initiated partition refreshes are no longer allowed on ACID
tables. Additionally, a refresh partition Metastore event actually
triggers a refresh on the whole ACID table.

An optimisation is implemented to check the locally latest table
level writeId, fetch the same from HMS and do a refresh only if they
don't match.
This couldn't be done for partitioned tables as apparently Hive
doesn't update the table level writeId if the transactional table is
partitioned. Similarly, checking the writeId for each partition and
refresh only the ones where the writeId is not up to date is not
feasible either as there is no writeId update when Hive makes schema
changes like adding a column neither on table level or on partition
level. So after a adding a column in Hive to a partitioned ACID table
and refreshing that table in Impala, still Impala wouldn't see the
new column. Hence, I unconditionally refresh the whole table if it's
ACID and partitioned. Note, that for non-partitioned ACID tables Hive
updates the table level writeId even for schema changes.

Change-Id: I1851da22452074dbe253bcdd97145e06c7552cd3
Reviewed-on: http://gerrit.cloudera.org:8080/13938
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Reload partition does not work for transactional tables
> ---
>
> Key: IMPALA-8600
> URL: https://issues.apache.org/jira/browse/IMPALA-8600
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> If a table is transactional, a reload partition call should fetch the valid 
> writeIds. Without doing this, the reload will skip adding all the newly 
> created delta files of the transactional table pertaining to the new writeIds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8636) Implement INSERT for insert-only ACID tables

2019-08-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898334#comment-16898334
 ] 

ASF subversion and git services commented on IMPALA-8636:
-

Commit 48bb93d4744f54f609f4f81580b17ef39d1f1a2b in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=48bb93d ]

IMPALA-8636: fix flakiness of ACID INSERT tests

I had to add @UniqueDatabase.parametrize(sync_ddl=True) to some e2e
tests because they were broken in exhaustive mode. When the tests run
with sync_ddl=True then the test files are executed against multiple
impalads which means that each statement in the .test file is executed
against a random impalad.

Change-Id: Ic724e77833ed9ea58268e1857de0d33f9577af8b
Reviewed-on: http://gerrit.cloudera.org:8080/13966
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement INSERT for insert-only ACID tables
> 
>
> Key: IMPALA-8636
> URL: https://issues.apache.org/jira/browse/IMPALA-8636
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: impala-acid
> Fix For: Impala 4.0, Impala 3.3.0
>
>
> Impala should support insertion for insert-only ACID tables.
> For this we need to allocate a write ID for the target table, and write the 
> data into the base/delta directories.
> INSERT operation should create a new delta directory with the allocated write 
> ID.
> INSERT OVERWRITE should create a new base directory with the allocated write 
> ID. This new base directory will only contain the data coming from this 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

2019-08-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898330#comment-16898330
 ] 

ASF subversion and git services commented on IMPALA-8780:
-

Commit 699450aadbf45f36617472b7c777dc2d9aad066a in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=699450a ]

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Reviewed-on: http://gerrit.cloudera.org:8080/13883
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows 
> are fetched
> -
>
> Key: IMPALA-8780
> URL: https://issues.apache.org/jira/browse/IMPALA-8780
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all 
> rows are fetched. The implementation should use the {{RowBatchQueue}} 
> introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator 
> fragments will be closed if all results fit in the {{RowBatchQueue}}. 
> {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto 
> the queue and then return. If the queue is full, it should block until there 
> is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from 
> the queue and then fills in the given {{QueryResultSet}} by using the 
> {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call 
> {{BufferedPlanRootSink::Close}} while the consumer is calling 
> {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so 
> that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is 
> running.
> The implementation of {{BufferedPlanRootSink}} should remain the same 
> regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a 
> {{BufferedTupleStream}} backed {{RowBatchQueue}} is used.
> {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the 
> sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries 
> that return under 1024 rows, all non-coordinator fragments are closed 
> immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows 
> buffering for 1+ {{RowBatch}}-es.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue

2019-08-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898329#comment-16898329
 ] 

ASF subversion and git services commented on IMPALA-8779:
-

Commit 699450aadbf45f36617472b7c777dc2d9aad066a in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=699450a ]

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Reviewed-on: http://gerrit.cloudera.org:8080/13883
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add RowBatchQueue interface with an implementation backed by a std::queue
> -
>
> Key: IMPALA-8779
> URL: https://issues.apache.org/jira/browse/IMPALA-8779
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Add a {{RowBatchQueue}} interface with an implementation backed by a 
> {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}-es 
> will help with the implementation of {{BufferedPlanRootSink}}. Rather than 
> tie the {{BufferedPlanRootSink}} to a specific method of queuing row batches, 
> we can use an interface. In future patches, a {{RowBatchQueue}} backed by a 
> {{BufferedTupleStream}} can easily be switched out in 
> {{BufferedPlanRootSink}}.
> We should consider re-factoring the existing {{RowBatchQueue}} to use the new 
> interface. The KRPC receiver does some buffering of {{RowBatch}}-es as well 
> which might benefit from the new RowBatchQueue interface, and some more KRPC 
> buffering might be added in IMPALA-6692.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8822) No metadata loading information in query profile when Catalog V2 enabled

2019-08-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898323#comment-16898323
 ] 

Sahil Takiar commented on IMPALA-8822:
--

[~vihangk1] any ideas what is going on here? Seems necessary for IMPALA-8795.

> No metadata loading information in query profile when Catalog V2 enabled
> 
>
> Key: IMPALA-8822
> URL: https://issues.apache.org/jira/browse/IMPALA-8822
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> When local catalog is enabled, we can no longer find table loading 
> information in query profile even just after invalidate metadata for the 
> tables.
> In Catalog V1, you can find the table loading information in query profile 
> like following:
> Query Compilation: 4s401ms
>   - Metadata load started: 661.084us (661.084us)
>   - Metadata load finished. loaded-tables=1/1 load-requests=1 
> catalog-updates=3: 3s819ms (3s819ms)
>  - Analysis finished: 3s820ms (763.979us)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-08-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898319#comment-16898319
 ] 

Sahil Takiar commented on IMPALA-8818:
--

Makes sense. I'm considering adding two query options then:
 * {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of pinned 
memory used for result spooling by setting a max reservation for the 
{{PlanRootSink}}
 ** In terms of the actual code, this will be used to set 
{{TBackendResourceProfile.max_reservation}}
 ** A value of 0 means the memory is unbounded, so no max reservation is set 
(which means {{Long.MAX_VALUE}} is used for the max reservation value), but as 
you said, the query-wide limit still applies
 ** Considering a default of 100 MB
 * {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of unpinned 
memory used for result spooling
 ** I think this requires some changes to {{BufferedTupleStream}} to track how 
much of its memory is unpinned (e.g. add an unpinned version of 
{{BufferedTupleStream::BytesPinned}})
 ** Based on my understanding of {{BufferedTupleStream}}, a call to 
{{UnpinStream}} unpins all the pages in the stream; this means that 
{{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} must be >= 
{{MAX_PINNED_RESULT_SPOOLING_MEMORY}} so that when {{UnpinStream}} is called, 
we don't exceed the value of {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}}
 ** I don't see a straightforward way to make this a hard limit because 
unpinned pages are not reserved (maybe I'm missing something), but I think for 
now it is sufficient to make this a soft limit (e.g. adding a {{RowBatch}} to 
the stream may push the amount of unpinned memory over the limit, but attempts 
to add additional batches will block)
 ** Considering a default of 1 GB

A few things I'm still trying to understand in BTS:
 * When a stream is unpinned, are new pages pinned or unpinned?
 * When do unpinned pages get spilled to disk / what decides if unpinned pages 
are spilled?

> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer 

[jira] [Closed] (IMPALA-8812) Impala Doc: SPLIT_PART to support negative indexes

2019-08-01 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8812.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: SPLIT_PART to support negative indexes
> --
>
> Key: IMPALA-8812
> URL: https://issues.apache.org/jira/browse/IMPALA-8812
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (IMPALA-8812) Impala Doc: SPLIT_PART to support negative indexes

2019-08-01 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8812.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: SPLIT_PART to support negative indexes
> --
>
> Key: IMPALA-8812
> URL: https://issues.apache.org/jira/browse/IMPALA-8812
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8824) Impala Doc: Document DROP TABLE for Insert-only ACID Tables

2019-08-01 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8824:
---

 Summary: Impala Doc: Document DROP TABLE for Insert-only ACID 
Tables
 Key: IMPALA-8824
 URL: https://issues.apache.org/jira/browse/IMPALA-8824
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8824) Impala Doc: Document DROP TABLE for Insert-only ACID Tables

2019-08-01 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8824:
---

 Summary: Impala Doc: Document DROP TABLE for Insert-only ACID 
Tables
 Key: IMPALA-8824
 URL: https://issues.apache.org/jira/browse/IMPALA-8824
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (IMPALA-8793) Implement TRUNCATE for insert-only ACID tables

2019-08-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8793:
-

Assignee: Zoltán Borók-Nagy

> Implement TRUNCATE for insert-only ACID tables
> --
>
> Key: IMPALA-8793
> URL: https://issues.apache.org/jira/browse/IMPALA-8793
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot TRUNCATE insert-only tables.
> TRUNCATE is a DDL statement that deletes all the files and drops all column 
> and table statistics. (Impala currently cannot truncate specific partitions, 
> only the whole table. Truncating specific partitions is out of scope of this 
> Jira.)
> TRUNCATE doesn't only mean to create a new empty base directory, but to 
> really remove all the files, this is the behavior of Hive as well.
> To implement TRUNCATE Impala must acquire an EXCLUSIVE lock on the table. 
> After that Impala must recursively delete all the data files belonging to the 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8823) Implement DROP TABLE for insert-only ACID tables

2019-08-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8823:
-

Assignee: Zoltán Borók-Nagy

> Implement DROP TABLE for insert-only ACID tables
> 
>
> Key: IMPALA-8823
> URL: https://issues.apache.org/jira/browse/IMPALA-8823
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-acid
>
> Impala currently cannot drop insert-only ACID tables.
> To implement DROP TABLE for insert-only tables at first we need to acquire an 
> exclusive lock from HMS, then proceed with the usual DROP TABLE process.
> Heartbeating the lock might be also needed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8823) Implement DROP TABLE for insert-only ACID tables

2019-08-01 Thread JIRA
Zoltán Borók-Nagy created IMPALA-8823:
-

 Summary: Implement DROP TABLE for insert-only ACID tables
 Key: IMPALA-8823
 URL: https://issues.apache.org/jira/browse/IMPALA-8823
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


Impala currently cannot drop insert-only ACID tables.

To implement DROP TABLE for insert-only tables at first we need to acquire an 
exclusive lock from HMS, then proceed with the usual DROP TABLE process.

Heartbeating the lock might be also needed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8600) Reload partition does not work for transactional tables

2019-08-01 Thread Gabor Kaszab (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-8600.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Reload partition does not work for transactional tables
> ---
>
> Key: IMPALA-8600
> URL: https://issues.apache.org/jira/browse/IMPALA-8600
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> If a table is transactional, a reload partition call should fetch the valid 
> writeIds. Without doing this, the reload will skip adding all the newly 
> created delta files of the transactional table pertaining to the new writeIds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8600) Reload partition does not work for transactional tables

2019-08-01 Thread Gabor Kaszab (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-8600.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Reload partition does not work for transactional tables
> ---
>
> Key: IMPALA-8600
> URL: https://issues.apache.org/jira/browse/IMPALA-8600
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> If a table is transactional, a reload partition call should fetch the valid 
> writeIds. Without doing this, the reload will skip adding all the newly 
> created delta files of the transactional table pertaining to the new writeIds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8600) Reload partition does not work for transactional tables

2019-08-01 Thread Gabor Kaszab (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898212#comment-16898212
 ] 

Gabor Kaszab commented on IMPALA-8600:
--

Created a follow-up Jira to refresh only a subset of partitions based on 
writeId change in case of a full table refresh.
https://issues.apache.org/jira/browse/IMPALA-8809

> Reload partition does not work for transactional tables
> ---
>
> Key: IMPALA-8600
> URL: https://issues.apache.org/jira/browse/IMPALA-8600
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
>
> If a table is transactional, a reload partition call should fetch the valid 
> writeIds. Without doing this, the reload will skip adding all the newly 
> created delta files of the transactional table pertaining to the new writeIds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8822) No metadata loading information in query profile when Catalog V2 enabled

2019-08-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created IMPALA-8822:


 Summary: No metadata loading information in query profile when 
Catalog V2 enabled
 Key: IMPALA-8822
 URL: https://issues.apache.org/jira/browse/IMPALA-8822
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Yongzhi Chen


When local catalog is enabled, we can no longer find table loading information 
in query profile even just after invalidate metadata for the tables.
In Catalog V1, you can find the table loading information in query profile like 
following:
Query Compilation: 4s401ms
  - Metadata load started: 661.084us (661.084us)
  - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3: 3s819ms (3s819ms)
 - Analysis finished: 3s820ms (763.979us)





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8822) No metadata loading information in query profile when Catalog V2 enabled

2019-08-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created IMPALA-8822:


 Summary: No metadata loading information in query profile when 
Catalog V2 enabled
 Key: IMPALA-8822
 URL: https://issues.apache.org/jira/browse/IMPALA-8822
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Yongzhi Chen


When local catalog is enabled, we can no longer find table loading information 
in query profile even just after invalidate metadata for the tables.
In Catalog V1, you can find the table loading information in query profile like 
following:
Query Compilation: 4s401ms
  - Metadata load started: 661.084us (661.084us)
  - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3: 3s819ms (3s819ms)
 - Analysis finished: 3s820ms (763.979us)





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IMPALA-8815) Ranger fail to start in minicluster if you source set-classpath.sh

2019-08-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8815:
--
Description: 
The below appear in logs/cluster/ranger/catalina.out
{noformat}
java.io.FileNotFoundException: 
/home/tarmstrong/Impala/incubator-impala/toolchain/cdh_components-1173663/hive-2.1.1-cdh6.x-SNAPSHOT/lib/datanucleus-api-jdo-3.2.1.jar
 (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:225)
at java.util.zip.ZipFile.(ZipFile.java:155)
at java.util.jar.JarFile.(JarFile.java:166)
at java.util.jar.JarFile.(JarFile.java:103)
at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93)
at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:99)
at 
sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
at 
sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
at org.apache.tomcat.util.scan.FileUrlJar.(FileUrlJar.java:48)
at 
org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:34)
at 
org.apache.catalina.startup.ContextConfig$FragmentJarScannerCallback.scan(ContextConfig.java:2682)
at 
org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:325)
at 
org.apache.tomcat.util.scan.StandardJarScanner.doScanClassPath(StandardJarScanner.java:235)
at 
org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:196)
at 
org.apache.catalina.startup.ContextConfig.processJarsForWebFragments(ContextConfig.java:1913)
at 
org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1264)
at 
org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:881)
at 
org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:388)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at 
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5606)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1707)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1697)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

Workaround:
{noformat}
(unset CLASSPATH && . bin/impala-config.sh && ./testdata/bin/run-all.sh )
{noformat}
or
{noformat}
(unset CLASSPATH && . bin/impala-config.sh && 
./testdata/bin/run-ranger-server.sh )
{noformat}

  was:
The below appear in logs/cluster/ranger/catalina.out
{noformat}
java.io.FileNotFoundException: 
/home/tarmstrong/Impala/incubator-impala/toolchain/cdh_components-1173663/hive-2.1.1-cdh6.x-SNAPSHOT/lib/datanucleus-api-jdo-3.2.1.jar
 (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:225)
at java.util.zip.ZipFile.(ZipFile.java:155)
at java.util.jar.JarFile.(JarFile.java:166)
at java.util.jar.JarFile.(JarFile.java:103)
at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93)
at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:99)
at 
sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
at 
sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
at org.apache.tomcat.util.scan.FileUrlJar.(FileUrlJar.java:48)
at 
org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:34)
at 
org.apache.catalina.startup.ContextConfig$FragmentJarScannerCallback.scan(ContextConfig.java:2682)
at 
org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:325)
at 
org.apache.tomcat.util.scan.StandardJarScanner.doScanClassPath(StandardJarScanner.java:235)
at 
org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:196)
at 
org.apache.catalina.startup.ContextConfig.processJarsForWebFragments(ContextConfig.java:1913)
at 
org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1264)
at 
org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:881)
at 

[jira] [Resolved] (IMPALA-8820) start-impala-cluster can't find catalogd on recent Ubuntu 16.04

2019-08-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8820.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> start-impala-cluster can't find catalogd on recent Ubuntu 16.04
> ---
>
> Key: IMPALA-8820
> URL: https://issues.apache.org/jira/browse/IMPALA-8820
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> We noticed a bug, first on jenkins.impala.io and then on local ubuntu 16.04 
> environments where start-impala-cluster fails to find a catalogd process, 
> despite the catalogd starting up fine.
> It appears that this is somehow related to an automatic update - it started 
> reproducing for me after I installed the batch of packages on 2019-07-31. 
> I've included the previous upgrade to illustrate the window in which this 
> happened.
> {noformat}
> Start-Date: 2019-07-27  00:21:00
> Commandline: apt-get dist-upgrade
> Requested-By: tarmstrong (1000)
> Install: linux-image-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-headers-4.4.0-157:amd64 (4.4.0-157.185, automatic), 
> linux-tools-4.4.0-157:amd64 (4.4.0-157.185, automatic), 
> linux-tools-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-headers-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-modules-extra-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-modules-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic)
> Upgrade: linux-tools-generic:amd64 (4.4.0.154.162, 4.4.0.157.165), 
> linux-headers-generic:amd64 (4.4.0.154.162, 4.4.0.157.165), 
> linux-libc-dev:amd64 (4.4.0-154.181, 4.4.0-157.185), mysql-client-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), mysql-server-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), libmysqlclient-dev:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), linux-image-generic:amd64 
> (4.4.0.154.162, 4.4.0.157.165), mysql-server:amd64 (5.7.26-0ubuntu0.16.04.1, 
> 5.7.27-0ubuntu0.16.04.1), mysql-client:amd64 (5.7.26-0ubuntu0.16.04.1, 
> 5.7.27-0ubuntu0.16.04.1), mysql-client-core-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), mysql-common:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), libmysqlclient20:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), firefox:amd64 
> (68.0+build3-0ubuntu0.16.04.1, 68.0.1+build1-0ubuntu0.16.04.1), 
> linux-tools-common:amd64 (4.4.0-154.181, 4.4.0-157.185), patch:amd64 
> (2.7.5-1ubuntu0.16.04.1, 2.7.5-1ubuntu0.16.04.2), linux-generic:amd64 
> (4.4.0.154.162, 4.4.0.157.165), mysql-server-core-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), docker-ce:amd64 
> (5:19.03.0~3-0~ubuntu-xenial, 5:19.03.1~3-0~ubuntu-xenial), 
> docker-ce-cli:amd64 (5:19.03.0~3-0~ubuntu-xenial, 5:19.03.1~3-0~ubuntu-xenial)
> End-Date: 2019-07-27  00:22:33
> Start-Date: 2019-07-31  12:55:00
> Commandline: apt-get dist-upgrade
> Requested-By: tarmstrong (1000)
> Upgrade: openjdk-8-jdk:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), libldap-2.4-2:amd64 (2.4.42+dfsg-2ubuntu3.5, 
> 2.4.42+dfsg-2ubuntu3.6), openjdk-8-jre:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), slack-desktop:amd64 (4.0.0, 4.0.1), 
> libsvn1:amd64 (1.9.3-2ubuntu1.1, 1.9.3-2ubuntu1.3), 
> openjdk-8-jdk-headless:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), libsvn-perl:amd64 (1.9.3-2ubuntu1.1, 
> 1.9.3-2ubuntu1.3), subversion:amd64 (1.9.3-2ubuntu1.1, 1.9.3-2ubuntu1.3), 
> openjdk-8-jre-headless:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1)
> End-Date: 2019-07-31  12:55:14
> {noformat}
> This issue is that the catalogd process's name is now "main", instead of 
> "catalogd".
> I think we can fix by changing our scripts to fall back to checking the 
> binary name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8820) start-impala-cluster can't find catalogd on recent Ubuntu 16.04

2019-08-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8820.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> start-impala-cluster can't find catalogd on recent Ubuntu 16.04
> ---
>
> Key: IMPALA-8820
> URL: https://issues.apache.org/jira/browse/IMPALA-8820
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> We noticed a bug, first on jenkins.impala.io and then on local ubuntu 16.04 
> environments where start-impala-cluster fails to find a catalogd process, 
> despite the catalogd starting up fine.
> It appears that this is somehow related to an automatic update - it started 
> reproducing for me after I installed the batch of packages on 2019-07-31. 
> I've included the previous upgrade to illustrate the window in which this 
> happened.
> {noformat}
> Start-Date: 2019-07-27  00:21:00
> Commandline: apt-get dist-upgrade
> Requested-By: tarmstrong (1000)
> Install: linux-image-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-headers-4.4.0-157:amd64 (4.4.0-157.185, automatic), 
> linux-tools-4.4.0-157:amd64 (4.4.0-157.185, automatic), 
> linux-tools-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-headers-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-modules-extra-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic), 
> linux-modules-4.4.0-157-generic:amd64 (4.4.0-157.185, automatic)
> Upgrade: linux-tools-generic:amd64 (4.4.0.154.162, 4.4.0.157.165), 
> linux-headers-generic:amd64 (4.4.0.154.162, 4.4.0.157.165), 
> linux-libc-dev:amd64 (4.4.0-154.181, 4.4.0-157.185), mysql-client-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), mysql-server-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), libmysqlclient-dev:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), linux-image-generic:amd64 
> (4.4.0.154.162, 4.4.0.157.165), mysql-server:amd64 (5.7.26-0ubuntu0.16.04.1, 
> 5.7.27-0ubuntu0.16.04.1), mysql-client:amd64 (5.7.26-0ubuntu0.16.04.1, 
> 5.7.27-0ubuntu0.16.04.1), mysql-client-core-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), mysql-common:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), libmysqlclient20:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), firefox:amd64 
> (68.0+build3-0ubuntu0.16.04.1, 68.0.1+build1-0ubuntu0.16.04.1), 
> linux-tools-common:amd64 (4.4.0-154.181, 4.4.0-157.185), patch:amd64 
> (2.7.5-1ubuntu0.16.04.1, 2.7.5-1ubuntu0.16.04.2), linux-generic:amd64 
> (4.4.0.154.162, 4.4.0.157.165), mysql-server-core-5.7:amd64 
> (5.7.26-0ubuntu0.16.04.1, 5.7.27-0ubuntu0.16.04.1), docker-ce:amd64 
> (5:19.03.0~3-0~ubuntu-xenial, 5:19.03.1~3-0~ubuntu-xenial), 
> docker-ce-cli:amd64 (5:19.03.0~3-0~ubuntu-xenial, 5:19.03.1~3-0~ubuntu-xenial)
> End-Date: 2019-07-27  00:22:33
> Start-Date: 2019-07-31  12:55:00
> Commandline: apt-get dist-upgrade
> Requested-By: tarmstrong (1000)
> Upgrade: openjdk-8-jdk:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), libldap-2.4-2:amd64 (2.4.42+dfsg-2ubuntu3.5, 
> 2.4.42+dfsg-2ubuntu3.6), openjdk-8-jre:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), slack-desktop:amd64 (4.0.0, 4.0.1), 
> libsvn1:amd64 (1.9.3-2ubuntu1.1, 1.9.3-2ubuntu1.3), 
> openjdk-8-jdk-headless:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1), libsvn-perl:amd64 (1.9.3-2ubuntu1.1, 
> 1.9.3-2ubuntu1.3), subversion:amd64 (1.9.3-2ubuntu1.1, 1.9.3-2ubuntu1.3), 
> openjdk-8-jre-headless:amd64 (8u212-b03-0ubuntu1.16.04.1, 
> 8u222-b10-1ubuntu1~16.04.1)
> End-Date: 2019-07-31  12:55:14
> {noformat}
> This issue is that the catalogd process's name is now "main", instead of 
> "catalogd".
> I think we can fix by changing our scripts to fall back to checking the 
> binary name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8593) Prohibit write to bucketed table

2019-08-01 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen resolved IMPALA-8593.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Prohibit write to bucketed table
> 
>
> Key: IMPALA-8593
> URL: https://issues.apache.org/jira/browse/IMPALA-8593
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Critical
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> Impala does not support writing to bucketed tables,  we need prohibit these 
> unsupported operations. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8593) Prohibit write to bucketed table

2019-08-01 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen resolved IMPALA-8593.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Prohibit write to bucketed table
> 
>
> Key: IMPALA-8593
> URL: https://issues.apache.org/jira/browse/IMPALA-8593
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Critical
>  Labels: impala-acid
> Fix For: Impala 3.3.0
>
>
> Impala does not support writing to bucketed tables,  we need prohibit these 
> unsupported operations. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-08-01 Thread Gabor Kaszab (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897818#comment-16897818
 ] 

Gabor Kaszab commented on IMPALA-4018:
--

Hey [~arodoni_cloudera],
It's on review and we already ran a few loops with the comments so I expect 
this to be submitted soon. Not sure about the 3.3 release timeline but I'd say 
Milestone1 of this should make it.

> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
>  ::=
>   FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
>  ::=
>   A.M. | P.M.
>  ::=
>   TZH
>  ::=
>   TZM
> {noformat}
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <> datetime conversions
> {noformat}
>  ::=
>   CAST 
>AS 
>   [ FORMAT  ]
>   
>  ::=
> 
>   | 
>  ::=
> 
> | 
>  ::=
>   
> {noformat}
> For example:
> {noformat}
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> {noformat}
> *Update*
> Here is the proposal for the new datetime patterns and their semantics:
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org