[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-12-03 Thread Pranay Singh (Code Review)
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/10587 
)

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh (320)
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh (320)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-08-01 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@9
PS4, Line 9: new row
this should apply more generally to new files, not just rows? either way, the 
wording here is inconsistent with the title, which refers to "file-only 
operations".


http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@17
PS4, Line 17: the
all? which partitions?


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1417
PS4, Line 1417: a
nit: an


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1420
PS4, Line 1420: by calling
nit: from


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@254
PS4, Line 254: reloads the table schema from Hive Meta Store
can in-line this comment directly on L259. same with the other descriptions of 
what each flag does when set.

doing so will remove some redundancy in the comment.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@585
PS4, Line 585: noneOf(MetaDataLoadFlag.class);
 : flags.add
simplify to .of(...) since we're always adding RELOAD_PARTITION_METADATA.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@663
PS4, Line 663: MetaDataLoadFlag of type 'RELOAD_TABLE_METADATA', 
'RELOAD_PARTITION_METADATA',
 :* 'RELOAD_FILE_METADATA'
replace with just flags.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@670
PS4, Line 670: msTbl
why is this pulled out now as a parameter. I see one place where its used-- 
what requires this change?
if its really needed, please add a comment (e.g., what does a null msTbl do?)


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3484
PS4, Line 3484: noneOf(MetaDataLoadFlag.class);
  :   flags.add(MetaDataLoadFlag.RELOAD_FILE_METADATA)
simplify to of(...) since the flag on L3485 is always included.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3486
PS4, Line 3486:  == true
remove



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:26:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-07-20 Thread Pranay Singh (Code Review)
Hello Bharath Vissapragada, Balazs Jeszenszky, Todd Lipcon, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10587

to look at the new patch set (#4).

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..

IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

The problem is that while inserting a new row to an existing partition
of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary
call to Hive MetaStore to do a getTable() and list the partitions of the
table. In such a case where a row is being inserted to an existing
partition or to an unpartitioned table only the file metadata for
HDFS tables needs to be updated.

This change avoids calling the Hive Meta Store in updatePartitionsFromHms()
by providing a hint from the caller, to skip loading the partitions.

Testing:
  Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 126 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/4
--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-07-20 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251
PS3, Line 1251:  RELOAD_PARTITION_METADATA if not set in flags results in not 
getting the
  :* partition details from HiveMetaStore in 
updatePartitionsFromHms().
> nit: this double negative is a bit confusing. Perhaps it's clearer to rephr
Changed the comment, thanks for rewording it.


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254
PS3, Line 1254: or if there are no partitions in the table
> I'm not sure I follow this part. How does the caller know if there are no p
That's right the unpartitioned case doesn't have to consider this, I have 
removed the sentence.



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:57:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-07-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 3:

(3 comments)

Just left some notes on the HdfsTable side. I dont know the CatalogOpExecutor 
code well so probably better for someone else to take a look at that part.

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251
PS3, Line 1251:  RELOAD_PARTITION_METADATA if not set in flags results in not 
getting the
  :* partition details from HiveMetaStore in 
updatePartitionsFromHms().
nit: this double negative is a bit confusing. Perhaps it's clearer to rephrase 
like:

If RELOAD_PARTITION_METADATA is set in flags, the partition details will be 
reloaded from the HMS. In some cases, this may not be necessary, and when 
unset, an expensive call to the HMS can be avoided.


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254
PS3, Line 1254: or if there are no partitions in the table
I'm not sure I follow this part. How does the caller know if there are no 
partitions in the table without fetching the list of partitions? In the case of 
an unpartitioned table (no partition keys) the code already takes care of this 
internally without the caller having to pass it, right?


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1418
PS3, Line 1418:  Updates the partitions of an HdfsTable so that they are in 
sync with the Hive
  :* Metastore.
this doesn't seem to be an accurate description of the function, since the 
function also reloads file metadata, right? Maybe the function should just be 
called updatePartitionMetadata? Then it's clearer that the flags specify 
whether we want to update the file metadata, the list of partitions, or both, 
right?



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 20 Jul 2018 16:57:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

2018-07-19 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10
PS2, Line 10: unecessary
> nit: check spelling and remove extra spaces.
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9: unpartitioned HDFS table,
   :   or to an existing partition
> simplify to just HDFS table, or is there some case that's being excluded?
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21
PS2, Line 21:   Testing: Ran core test without failure.
> Is there any new scenario that you want to test for this change? If existin
Yes I want to test the scenario where concurrent removal
of partition and insertion of row to the same partition happens.

a) Impala is inserting the row to an existing partition
b) Hive is removing the partition at the same time.


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
> needs a comment.
Done


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
> needs a comment.
Added the comment


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365
PS2, Line 1365:   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
> any way to refactor this function so you don't need to duplicate this code
Refactored the code and added a new function to reload the partition.


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640
PS2, Line 640:   private void loadTableMetadata(Table tbl, long 
newCatalogVersion,
> the arguments here are getting quite unwieldly, and it seems that the two e
Made changes to the code to use enum MetadataLoadFlag



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 19 Jul 2018 18:27:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

2018-07-19 Thread Pranay Singh (Code Review)
Hello Bharath Vissapragada, Todd Lipcon, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10587

to look at the new patch set (#3).

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations
..

IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

The problem is that while inserting a new row to an existing partition
of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary
call to Hive MetaStore to do a getTable() and list the partitions of the
table. In such a case where a row is being inserted to an existing
partition or to an unpartitioned table only the file metadata for
HDFS tables needs to be updated.

This change avoids calling the Hive Meta Store in updatePartitionsFromHms()
by providing a hint from the caller, to skip loading the partitions.

Testing:
  Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 126 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/3
--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

2018-07-06 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365
PS2, Line 1365:   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
any way to refactor this function so you don't need to duplicate this code from 
line 1429-1435 up to here? perhaps extracting another function for all of the 
partition-reloading code and only call it when you dont have the 'no new 
partitions' hint?


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640
PS2, Line 640:   private void loadTableMetadata(Table tbl, long 
newCatalogVersion,
the arguments here are getting quite unwieldly, and it seems that the two 
existing booleans have a positive sense ("DO reload filemetadata" and "DO 
reload table schema") whereas this new boolean has a negative sense "DONT 
reload data from HMS".

How about refactoring this slightly to use an enum and an EnumSet, something 
like:

static enum MetadataLoadFlag {
  RELOAD_TABLE_METADATA,
  RELOAD_PARTITION_METADATA,
  RELOAD_FILE_METADATA
}

void loadTableMetadata(Table tbl, long newCatalogVersion,
  EnumSet flags,
  Set partitionsToUpdate,
  Table msTable);

.. and perhaps add Preconditions that validate that the optional msTable, 
partitionsToUpdate, etc params are only set when it makes sense according to 
the flags?


I think this would make the call sites and code a lot more understandable.



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Jul 2018 17:54:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

2018-07-06 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
nit: add a space


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9:
remove the description block's indent


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10
PS2, Line 10: unecessary
nit: check spelling and remove extra spaces.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9: unpartitioned HDFS table,
   :   or to an existing partition
simplify to just HDFS table, or is there some case that's being excluded?


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@14
PS2, Line 14: This extra call to HMS introduces a point of failure, we need to 
handle
:   for error scenario when Hive MetaStore crashes
So is this change an optimization, a correctness/robustness fix, or both? Also, 
I don't understand the point about a failure-- pls add an example with a 
sequence of steps that currently leads to an error. Can add such an example to 
the jira.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@15
PS2, Line 15: This change removes the
:   extra call to Hive Meta Store for the case when a row is 
inserted to an
:   existing partition in HDFS table or when a row is added to 
unpartitioned
:   table. Thus, an optimization as it reduces the call to Hive 
Meta Store
:   during Update of Catalog.
lots of redundancy here with the prev. paragraph-- pls condense or remove.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21
PS2, Line 21:   Testing: Ran core test without failure.
Is there any new scenario that you want to test for this change? If existing 
tests cover it, is there any way to assert that this change is operating as 
expected?


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
needs a comment.



--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Jul 2018 16:19:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

2018-06-02 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10587


Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only 
operations
..

IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

  The problem is that while inserting a  new row to an unpartitioned HDFS table,
  or to an existing partition, the catalogd makes unecessary call to Hive Meta
  Store to do a getTable() and list the partitions of the table, when in fact
  only the file metadata for HDFS tables needs to be updated.

  This extra call to HMS introduces a point of failure, we need to handle
  for error scenario when Hive MetaStore crashes. This change removes the
  extra call to Hive Meta Store for the case when a row is inserted to an
  existing partition in HDFS table or when a row is added to unpartitioned
  table. Thus, an optimization as it reduces the call to Hive Meta Store
  during Update of Catalog.

  Testing: Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 34 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/1
--
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-21 Thread Pranay Singh (Code Review)
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/10450 
)

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-21 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410
PS2, Line 1410: partitionsToUpdateFileMdByPath = 
getPartitionsByPath(partitionsToUpdate);
  :   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
> I don't think so. In line 1404, the dirtyPartitions are all added to partit
Yes missed on that one, I'll rollback this change.



--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 May 2018 15:53:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410
PS2, Line 1410: partitionsToUpdateFileMdByPath = 
getPartitionsByPath(partitionsToUpdate);
  :   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
> The new change behaves very much like the old code except for the case when
I don't think so. In line 1404, the dirtyPartitions are all added to 
partitionsToRemove. When dirtyPartitions exist, partitionsToRemove won't be 
empty. Thus the if-branch won't be chosen and the else-branch performs the same 
as the old codes.



--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 20 May 2018 01:27:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-19 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

> > (2 comments)
 > >
 > > Just interested in this optimization. May I ask some questions?
 > >
 > > Looks like we are optimizing the case when partitionsToUpdate !=
 > > null and partitions were neither dropped, created. Can we
 > optimize
 > > the case that partitionsToUpdate != null and some partitions are
 > > dropped? For example when an INSERT OVERWRITE statement updates
 > the
 > > majority of the partitions and only drops few of them.
 >
 > Will this case not introduce inconsistency between HMS and Impala?

The problem with optimization is that introduces inconsistency, something which 
my change introduces too, when ALTER TABLE is done and HMS crashes.


--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 May 2018 18:18:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-19 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

> (2 comments)
 >
 > Just interested in this optimization. May I ask some questions?
 >
 > Looks like we are optimizing the case when partitionsToUpdate !=
 > null and partitions were neither dropped, created. Can we optimize
 > the case that partitionsToUpdate != null and some partitions are
 > dropped? For example when an INSERT OVERWRITE statement updates the
 > majority of the partitions and only drops few of them.

Will this case not introduce inconsistency between HMS and Impala?


--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 May 2018 18:15:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-19 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

(2 comments)

> (2 comments)
 >
 > Just interested in this optimization. May I ask some questions?
 >
 > Looks like we are optimizing the case when partitionsToUpdate !=
 > null and partitions were neither dropped, created. Can we optimize
 > the case that partitionsToUpdate != null and some partitions are
 > dropped? For example when an INSERT OVERWRITE statement updates the
 > majority of the partitions and only drops few of them.

Will this case not cause introduce inconsistency between HMS and Impala ?

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409
PS2, Line 1409: size() == 0
> nit: can be simplified by isEmpty()
OK


http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410
PS2, Line 1410: partitionsToUpdateFileMdByPath = 
getPartitionsByPath(partitionsToUpdate);
  :   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
> Looks like the original codes perform the same as these two lines. Since dr
The new change behaves very much like the old code except for the case when 
dirtyPartitions exist in  that case there is an overhead of dropping and 
loading the dirty partitions from Metastore.



--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 May 2018 18:14:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10450 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..


Patch Set 2:

(2 comments)

Just interested in this optimization. May I ask some questions?

Looks like we are optimizing the case when partitionsToUpdate != null and 
partitions were neither dropped, created. Can we optimize the case that 
partitionsToUpdate != null and some partitions are dropped? For example when an 
INSERT OVERWRITE statement updates the majority of the partitions and only 
drops few of them.

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409
PS2, Line 1409: size() == 0
nit: can be simplified by isEmpty()


http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410
PS2, Line 1410: partitionsToUpdateFileMdByPath = 
getPartitionsByPath(partitionsToUpdate);
  :   loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, 
true);
Looks like the original codes perform the same as these two lines. Since 
dropPartitions, loadPartitionsFromMetastore and loadPartitionsFromMetastore 
will return fast when their first parameter is empty. partitionExist is not 
empty means partitionsToUpdate != null, so the original codes will perform as 
these two lines.

My question is what are the HMS requests we save?



--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 May 2018 13:17:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

2018-05-18 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10450


Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only 
operations.
..

IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.

Catalog currently loads the HMS data even, if only the file metadata load
is requested. This change detects some of those updates and tries to skip
the loading of HMS data if only the file metadata load is requested.

Testing:
  Ran the core test without failures.

Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 39 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10450/1
--
To view, visit http://gerrit.cloudera.org:8080/10450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 10450
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh