[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

2019-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in 
ThreadTokenAvailableCb
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

2019-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in 
ThreadTokenAvailableCb
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4019/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

2019-04-13 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in 
ThreadTokenAvailableCb
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 13 Apr 2019 20:27:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2772/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 15
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 13 Apr 2019 09:00:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-13 Thread Anurag Mantripragada (Code Review)
Anurag Mantripragada has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..

IMPALA-7971: Add support for insert events in event processor.

This patch adds support for detecting and processing insert events
triggered by impala as well as external engines (eg.Hive).

Inserts from Impala will fire an insert event notification.
Using this event, event-processor will refresh table/partition.
Both insert into and overwrite are supported for tables/partitions.

Known Issues:
1. Inserts into tables from Hive are ignored by the event processor
   as these inserts create an ALTER event first followed by an
   INSERT event. The alter will invalidate table making the refresh
   a no-op. Insert into partitions from hive will create an INSERT
   event first followed by an ALTER event. In this case, there is
   an unnecessary table invalidate after a refresh.
2. Existing self-events logic cannot be used for insert events since
   firing insert event does not allow us to modify table parameters in
   HMS. This means we cannot get the CatalogServiceIdentifiers in insert
   events. Therefore, the event-processor will also refresh the tables
   for which insert operation is performed through Impala.

Testing:
1. Added new custom cluster tests to run different insert commands from
hive and verified new data is available in Impala without invalidate
metadata.

2. Added a test in MetastoreEventsProcessor for testing insert events.

Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
---
M be/src/service/client-request-state.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A tests/custom_cluster/test_event_processing.py
9 files changed, 548 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/12889/15
--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 15
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-13 Thread Anurag Mantripragada (Code Review)
Anurag Mantripragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 15:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3587
PS14, Line 3587:  Map> filesBeforeInsertForPartitions = new 
HashMap<>();
   :   Set filesBeforeInsertForTable = new HashSet<>();
> Both of them can be handled in a single map? Also, rename to filesBeforeIns
Here we track fileNames of all affectedPartitions using a Map. Later, this map is used to retrieve and compare fileNames to 
calculate deltas using partitionIds. However, partitionId cannot be used for a 
non-partitioned table because they change during load().

One solution was to use partitionNames instead of partitionIds. We decided to 
go with Ids because using partitionNames was too error prone.


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3589
PS14, Line 3589: affectedExistingPartitions = new
> just say affectedPartitions ?
Changed it to affectedExistingPartitions. Want to stress on 'existing' as these 
are existing partitions that are involved in this insert event. There could be 
some new partitions created in the insert which are not stored here.


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3608
PS14, Line 3608:  if (catalog_.isExternalEventProcessingEnabled()) {
> not needed.
We don't wish to do this if event processing is disabled right?


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3704
PS14, Line 3704: if (catalog_.isExternalEventProcessingEnabled()) {
> not needed.
We don't wish this path be taken if event processor is disabled. Right?


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3736
PS14, Line 3736: filesBeforeInsertForTable = ((HdfsPartition) 
Iterables
> Like I mentioned above, I don't see why we should use two different datastr
As mentioned before, we cannot keep track of filenames using partitionIds for 
non-partitioned tables as we drop and create the single partition during load. 
(partitionId changes.). Hence two datastructures.


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3743
PS14, Line 3743:   // After loading metadata, fire insert event if external 
event processing is
   :   // enabled.
   :   if (table.getNumClusteringCols() > 0) {
   : createInsertEventsForPartitions(table, 
filesBeforeInsertForPartitions,
   : update.is_overwrite);
   :   } else  {
   : createInsertEventForTable(table, 
filesBeforeInsertForTable, update.is_overwrite);
   :   }
> How about changing the signature to
As mentioned above this is not possible because of two different data 
structures for filesBeforeInsert for partition case and non-partitioned case


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3787
PS14, Line 3787: deltaFiles = ((HdfsPartition)part).getFileNames();
> shouldn't this go before if? tests didn't catch this?
This looks right to me. We want deltaFiles to be empty when it is an overwrite. 
We only calculate the new files if its not an overwrite.


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3791
PS14, Line 3791: deltaFiles.size
> need this?
Removed.


http://gerrit.cloudera.org:8080/#/c/12889/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3813
PS14, Line 3813: deltaFiles = ((HdfsPartition)singlePart).getFileNames();
> same question, shouldn't deltaFiles go before if?
We want delta files to be empty if it is an overwrite.



--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 15
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 13 Apr 2019 08:19:05 +
Gerrit-HasComments: Yes