[jira] [Comment Edited] (HIVE-18940) Hive notifications serialize all write DDL operations

2018-03-15 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400948#comment-16400948
 ] 

Vihang Karajgaonkar edited comment on HIVE-18940 at 3/15/18 8:17 PM:
-

I think may be I didn't make my suggestion very clear. I did not suggest using 
auto-increment for commit id. auto-increment is just use to uniquely identify 
an event. This useful because now we are adding tons of information in one 
event. So instead of creating one giant event we can create several small 
events. The relationship between a commit id and event id is one to many. There 
could be many event ids associated with one commit id. The commit id itself is 
a monotonically increasing global number generated just before a transaction 
does an actual commit. Here is the pseudo code of my suggestion above:

Modify NOTIFICATION_LOG to add a new field CommitID

Modify NOTIFICATION_LOG table such that EVENT_ID is an auto-increment

Remove the use of NOTIFICATION_SEQUENCE (or perhaps we can reuse it to generate 
commit id)

Now here is how the methods generating events might look like
{code:java}
public someMethodWhichGeneratesEvents() {
List allEventsInThisTransaction;
openTransaction(allEventsInThisTransaction);
//do some work which adds events to allEventsInThisTransaction
//openTransaction again; pass the allEventsInThisTransaction
openTransaction(allEventsInThisTransaction);
//dummy commit
commit(allEventsInThisTransaction);
//actual commit
commit(allEventsInThisTransaction);
}
{code}
 The commits would look like this
  
{code:java}
public boolean commit(List eventsInThisTransaction) {
counter--;
//if this is an actual commit
if (counter == 0) {
//generate one monotonically increasing number from database
//same logic as used for generating event id currently from 
NOTIFICATION_SEQUENCE table
//lock is held for a constant time irrespective of how many events this 
transaction generates
commitId = getCommitID();
//add events
for (Event event : eventsInThisTransaction) {
  event.setCommitId(commitID);
  addNotificationEvent(event);
}
return pm.commit();
}
{code}
 

So if the commit fails, no events are generated. If the commit succeeds all the 
events are generated with unique event ids and the same commit id. If there are 
holes in commit id, it means that the transaction has failed and is guaranteed 
never to be filled.


was (Author: vihangk1):
I think may be I didn't make my suggestion very clear. I did not suggest using 
auto-increment for commit id. auto-increment is just use to uniquely identify 
an event. This useful because now we are adding tons of information in one 
event. The relationship between a commit id and event id is one to many. There 
could be many event ids associated with one commit id. The commit id itself is 
a monotonically increasing global number generated just before a transaction 
does an actual commit. Here is the pseudo code of my suggestion above:

Modify NOTIFICATION_LOG to add a new field CommitID

Modify NOTIFICATION_LOG table such that EVENT_ID is an auto-increment

Remove the use of NOTIFICATION_SEQUENCE (or perhaps we can reuse it to generate 
commit id)

Now here is how the methods generating events might look like
{code:java}
public someMethodWhichGeneratesEvents() {
List allEventsInThisTransaction;
openTransaction(allEventsInThisTransaction);
//do some work which adds events to allEventsInThisTransaction
//openTransaction again; pass the allEventsInThisTransaction
openTransaction(allEventsInThisTransaction);
//dummy commit
commit(allEventsInThisTransaction);
//actual commit
commit(allEventsInThisTransaction);
}
{code}
 The commits would look like this
  
{code:java}
public boolean commit(List eventsInThisTransaction) {
counter--;
//if this is an actual commit
if (counter == 0) {
//generate one monotonically increasing number from database
//same logic as used for generating event id currently from 
NOTIFICATION_SEQUENCE table
//lock is held for a constant time irrespective of how many events this 
transaction generates
commitId = getCommitID();
//add events
for (Event event : eventsInThisTransaction) {
  event.setCommitId(commitID);
  addNotificationEvent(event);
}
return pm.commit();
}
{code}
 

So if the commit fails, no events are generated. If the commit succeeds all the 
events are generated with unique event ids and the same commit id. If there are 
holes in commit id, it means that the transaction has failed and is guaranteed 
never to be filled.

> Hive notifications serialize all write DDL operations
> -
>
> Key: HIVE-18940
> URL: https://issues.apache.org/jira/browse/HIVE-18940
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alexander Kolbasov
>Priority: M

[jira] [Comment Edited] (HIVE-18940) Hive notifications serialize all write DDL operations

2018-03-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397463#comment-16397463
 ] 

Thejas M Nair edited comment on HIVE-18940 at 3/13/18 6:54 PM:
---

For replication purposes, and perhaps for sentry delta updates capture as well, 
the EVENT_ID has to be in the order of commit.
For example, if the EVENT_ID 5 has been written and then consumed by 
replication program, it would then only look for rows where EVENT_ID > 5. So if 
there are two concurrent transactions writing new rows and one of them with 
EVENT_ID 5 commits before EVENT_ID 4, then EVENT_ID 4 would get missed.
Holes would be OK, what is not OK is that for another application to see row 
with EVENT_ID 5 getting visible before one with EVENT_ID 4.

DB generated timestamp has same issue, unless it can represent the commit 
sequence.

I believe the use of database autoincrement field was considered in HIVE-16886 
and it wasn't meeting this criteria. 

cc [~anishek]


was (Author: thejas):
For replication purposes, and perhaps for sentry delta updates capture as well, 
the EVENT_ID has to be in the order of commit.
For example, if the EVENT_ID 5 has been written and then consumed by 
replication program, it would then only look for rows where EVENT_ID > 5. So if 
there are two concurrent transactions writing new rows and one of them with 
EVENT_ID 5 commits before EVENT_ID 4, then EVENT_ID 4 would get missed.
Holes would be OK, what is not OK is that for another application to see row 
with EVENT_ID 5 getting visible before one with EVENT_ID 4.

I believe the use of database autoincrement field was considered in HIVE-16886 
and it wasn't meeting this criteria. 

cc [~anishek]

> Hive notifications serialize all write DDL operations
> -
>
> Key: HIVE-18940
> URL: https://issues.apache.org/jira/browse/HIVE-18940
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alexander Kolbasov
>Priority: Major
>
> The implementation of DbNotificationListener uses a single row to store 
> current notification ID and uses {{SELECT FOR UPDATE}} to lock the row. This 
> serializes all write DDL operations which isn't good.
> We should consider using database auto-increment for notification ID instead. 
> Especially on mMySQL/innoDb it is supported natively with relatively 
> light-weight locking. 
> This creates potential issue for consumers though because such IDs may have 
> holes. There are two types of holes - transient hole for a transaction which 
> have not committed yet and will be committed shortly and permanent holes for 
> transactions that fail. Consumers need to deal with it. It may be useful to 
> add DB-generated timestamp as well to assist in recovery from holes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)