[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-07-15 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885410#comment-16885410
 ] 

Vihang Karajgaonkar commented on IMPALA-7973:
-

Thats right. This does not add new configurations. In case of partition events, 
if the documentation says it  invalidates the table, with this patch it should 
say it refreshes the partitions from the event. Only applies to the alter, add 
and drop partition events. Hope that helps.

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-07-13 Thread Anurag Mantripragada (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884521#comment-16884521
 ] 

Anurag Mantripragada commented on IMPALA-7973:
--

[~arodoni_cloudera], Before this change, any partition level events would 
invalidate the entire table. After this change, such events selectively refresh 
the partitions which were involved. This applies to create, drop and alter 
partitions.

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-07-13 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884510#comment-16884510
 ] 

Alex Rodoni commented on IMPALA-7973:
-

[~anuragmantri] [~vihangk1] I do not see a mention of a config change user has 
to perform. Could you tell me what to document for this feature? The current 
doc already mentions the partition-level events/updates. Thanks!

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-05-02 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832014#comment-16832014
 ] 

Tim Armstrong commented on IMPALA-7973:
---

[~anuragmantri] can you set the fix version please?

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-04-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830765#comment-16830765
 ] 

ASF subversion and git services commented on IMPALA-7973:
-

Commit 3ad5b3fba202bf6809986a86f5e041da656f0a88 in impala's branch 
refs/heads/master from Anurag Mantripragada
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3ad5b3f ]

IMPALA-7973: Add support for fine grained events processing for
partition level HMS events.

This patch adds support for fine grained updates for add/drop/alter
partition events.

Currently, partition events invalidate the table. This can be
expensive for large tables. Here, we refresh affected partitions
in case of add/drop/alter partition events. HMS processes add/drop
partitions in a transaction, which means there may be multiple
partitions affected in a single add/drop event. We try to refresh all
these partitions in a loop. If any of the partition refresh fails,
we throw MetastoreNotificationNeedsInvalidateException to mandate a
manual invalidate for event processing to continue.

Testing:
Modified pre-existing tests for partition events to instead test if
partitions are added/dropped/altered when event processing is enabled.

Change-Id: I213401329f3965dd81055197792ccf8a05368af5
Reviewed-on: http://gerrit.cloudera.org:8080/13111
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-03-29 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805261#comment-16805261
 ] 

Vihang Karajgaonkar commented on IMPALA-7973:
-

Assigning it to you [~anuragmantri] since you are interested in taking this up.

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-03-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790811#comment-16790811
 ] 

ASF subversion and git services commented on IMPALA-7973:
-

Commit 9d67aafaea5ba8673d130dfc7c1e7a0b4f58303b in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9d67aaf ]

IMPALA-7972 Detect self-events to avoid unnecessary invalidates

This patch adds support to detect self-generated events from catalog.
This is used to avoid unnecessary invalidates to the tables from such
self-events. Currently, alter_table, alter_partition, add_partition and
drop_partition event types can invalidate the table metadata.

Originally, we planned to have a global version number support from
metastore (see HIVE-21115). But since that is still not complete, we
rely on a combination of other identifiers to determine if a event is
self-generated or not. These self-event identifiers consists of values
from the table/partition parameters. A catalog service uuid
and the catalog version number. The uuid is generated for each
catalogservice when it comes up and it adds it to the table/partition
parameters with the key "impala.CatalogServiceId". The catalog version
number is added with the key "impala.CatalogVersion".

When catalog executes a DDL operation it appends the current catalog
version to the list of version numbers for the in-flight events for the
table. Events processor clears this version when the corresponding
version number identified by serviceId is received in the event. This is
needed since it is possible that a external non-Impala system which
generates the event presents the same serviceId and version number later
on. The algorithm to detect a self-event is as below.

1. Add the service id and expected catalog version to table/partition
parameters when executing the DDL operation. When the HMS operation is
successful, add the version number to the list of version for in-flight
events at table level.
2. When the event is received, the first time you see the combination of
serviceId and version number, event processor clears the version number
from table's list and determines the event as self-generated (and hence
ignored)
3. If the event data presents a unknown serviceId or if the version
number is not present in the list of in-flight versions, event is not a
self-event and needs to be processed.

In order to limit the total memory footprint, only 10 version numbers
are stored at the table. Since the event processor is expected to poll
every few seconds this should be a reasonable bound which satisfies most
use-cases. Otherwise, event processor may wrongly process a self-event
to invalidate the table. In such a case, its a performance penalty not a
correctness issue.

In case of drop_partition event, the partition object is not available
in the event. Hence we cannot determine if its a self-event. In such
cases currently we always issue a invalidate command. This is a known
limitation and will be improved in IMPALA-7973

Patch adds new tests to trigger alter table/partition DDLs from impala
and makes sure that the table is not invalidated.

Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Reviewed-on: http://gerrit.cloudera.org:8080/12591
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2018-12-13 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720713#comment-16720713
 ] 

Vihang Karajgaonkar commented on IMPALA-7973:
-

Yes, this would introduce a new configuration and it would need documentation 
support so that users are aware of what to expect and how to use this feature.

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2018-12-13 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720538#comment-16720538
 ] 

Alex Rodoni commented on IMPALA-7973:
-

[~vihangk1] Is this a user-facing feature that has a doc impact?

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org