[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899918#action_12899918
 ] 

Namit Jain commented on HIVE-1293:
--

Agreed on the bug in getLockObjects() - will have a new patch.


Filed a new patch for the followup: 
https://issues.apache.org/jira/browse/HIVE-1293

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900128#action_12900128
 ] 

Namit Jain commented on HIVE-1293:
--

Fixed a lot of bugs, added a lot of comments, tested it with a zooKeeper 
cluster of  3 nodes.
select * currently performs a dirty read, we can add a new parameter to change 
that behavior if need be.



 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899320#action_12899320
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

a little bummed that locks need to be held for entire query execution. that 
could mean a writer blocking readers for hours.

hive's query plans seem to be of two distinct stages:
1. read a bunch of stuff, compute intermediate/final data
2. move final data into output locations

ie. - a single query never reads what it writes (into a final output location). 
even if #1 and #2 are mingled today - they can easily be put in order.

in that sense - we only need to get shared locks for all read entities involved 
in #1 to begin with. once phase #1 is done, we can drop all the read locks and 
get the exclusive locks for all the write entities in #2, perform #2 and quit. 
that way exclusive locks are held for a very short duration. i think this 
scheme is similarly deadlock free (now there are two independent lock 
acquire/release phases - and each of them can lock stuff in lex. order).

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899347#action_12899347
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

also - i am missing something here:

+  for (WriteEntity output : plan.getOutputs()) {
+lockObjects.addAll(getLockObjects(output.getTable(), 
output.getPartition(), HiveLockMode.EXCLUSIVE));
+  }

getLockObjects():

+if (p != null) {
...
+  locks.add(new LockObject(new HiveLockObject(p.getTable()), mode));
+}

doesn't this end up locking the table in exclusive mode if a partition is being 
written to? (whereas the design talks about locking the table in shared mode 
only?)

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899640#action_12899640
 ] 

Namit Jain commented on HIVE-1293:
--

The partition being written to is locked in exclusive mode - the table should 
be locked in shared mode.
The write entity should only consist of the partition.

There might be a bug there - https://issues.apache.org/jira/browse/HIVE-1548 
should populate the inputs and outputs appropriately.
I will start on this now.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899647#action_12899647
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

can u check the getLockObjects() routine. it seemed to me that even u called 
with partition in X mode - it would add the table to the list of objects to be 
locked as well (in the same X mode).

i think we should, at least, as follow on make the optimization to not lock 
write entities for the duration of the query.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897917#action_12897917
 ] 

John Sichi commented on HIVE-1293:
--

I think ZK default client port would be 2181; see HBASE-2305.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897946#action_12897946
 ] 

John Sichi commented on HIVE-1293:
--

From testing:  the parsed lock mode seems to be case-sensitive:

hive lock table blah shared;
Failed with exception No enum const class 
org.apache.hadoop.hive.ql.lockmgr.HiveLockMode.shared
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

If I use lock table blah SHARED it works.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread Basab Maulik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897953#action_12897953
 ] 

Basab Maulik commented on HIVE-1293:


Re: One lib question: Zookeeper

hbase-handler with hbase 0.20.x does not work with zk 3.3.1 but works fine with 
the version it ships with, zk 3.2.2. Have not investigated what breaks.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897977#action_12897977
 ] 

John Sichi commented on HIVE-1293:
--

Namit, I tried testing with a standalone zookeeper via CLI.  Locking a table 
succeeded, but then show locks didn't show anything, and unlock said the lock 
didn't exist.

I think the reason is that CLI is creating a new Driver for each statement 
executed, and when the old Driver is closed, the lock manager is closed along 
with it (closing the ZooKeeper client instance).  As a result, locks are 
released immediately after LOCK TABLE is executed.

When I tested with a thrift server plus two JDBC clients, all was well.  I was 
able to take a lock from one client and prevent the other client from getting 
the same lock.  So I guess the thrift server is keeping one Driver around per 
connection.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897988#action_12897988
 ] 

John Sichi commented on HIVE-1293:
--

Here's a scenario which is not working correctly.  (Tested with thrift server 
plus JDBC clients.)

Existing table foo.

Client 1:  LOCK TABLE foo EXCLUSIVE;

Client 2:  DROP TABLE foo;

According to the doc, the DROP TABLE should fail, but it succeeds.  Same is 
true for LOAD DATA.  Probably the same reason in both cases:  for these 
commands we don't register the output in the PREHOOK (only the POSTHOOK).  
INSERT is getting blocked correctly since it's in the PREHOOK.



 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898010#action_12898010
 ] 

John Sichi commented on HIVE-1293:
--

After seeing some other issues, had a chat with Namit about semantics; here's 
what we worked out.

* Normally, locks should only be held for duration of statement execution.
* However, LOCK TABLE should take a global lock (not tied to any particular 
session or statement).
* UNLOCK TABLE should remove both kinds of lock (statement-level and global).  
Likewise, SHOW LOCKS shows all.
* For fetching results, we'll need a parameter to control whether a dirty read 
is possible.  Normally, this is not an issue since we're fetching from saved 
temp results, but when using select * from t to fetch directly from the 
original table, this behavior makes a difference.  To prevent dirty reads, 
we'll need the statement-level lock to span the duration of the fetch.

To avoid leaks, we need to make sure that once we create a ZooKeeper client, we 
always close it.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898012#action_12898012
 ] 

John Sichi commented on HIVE-1293:
--

Also, as a followup, need to add client info such as hostname, process ID to 
SHOW LOCKS.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897548#action_12897548
 ] 

John Sichi commented on HIVE-1293:
--

Two configuration questions:

* You have hive.support.concurrency=true in hive-default.xml.  Probably we want 
it false instead (only on during tests) since most people using Hive won't have 
a zookeeper quorum set up?

* Isn't there a default value we can use for hive.zookeeper.client.port?

One lib question:

* Zookeeper is now available from maven.  Maybe we should delete the one in 
hbase-handler/lib and get it via ivy instead of adding it in the top-level lib? 
 The version we have checked in is 3.2.2, but the maven availability is 3.3.x, 
so we'd need to test to make sure everything (including hbase-handler) still 
works with the newer version.

http://mvnrepository.com/artifact/org.apache.hadoop/zookeeper

Two cleanups:

* In QTestUtil.java, you left the following code commented out; can we get rid 
of it?

+  //  for (int i = 0; i  qfiles.length; i++) {
+  //qsetup[i].tearDown();
+  //  }

* In DDLTask.java, you left some commented-out debugging code (two instances):

+//console.printError(conflicting lock present  + tbl +  
cannot be locked in mode  + mode);



 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897577#action_12897577
 ] 

Namit Jain commented on HIVE-1293:
--

Did the cleanups and changed default value of hive.support.concurrency to false

Not sure how can we set a default value for hive.zookeeper.client.port?


Let us do the lib cleanup in a follow-up - I will file a jira

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
 hive.1293.4.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896182#action_12896182
 ] 

John Sichi commented on HIVE-1293:
--

Added comments here:

https://review.cloudera.org/r/563/

(doesn't seem to be adding comments here)


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive.1293.2.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885631#action_12885631
 ] 

He Yongqiang commented on HIVE-1293:


I am going to commit this patch in the next few days. Please post your comments 
if you have any, so we can fix them before this patch gets in.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch, hive_leases.txt


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884316#action_12884316
 ] 

Namit Jain commented on HIVE-1293:
--

1. No, 2 clients cannot get conflicting locks -- zookeeper will guarantee that, 
will read again to double-check
2. If client A cannot get the lock, it will unlock only its own lock.  However, 
there is no security - if client A does unlock table A explicitly, all locks on 
A are released

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884342#action_12884342
 ] 

Namit Jain commented on HIVE-1293:
--

Confirmed 1

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-01 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884343#action_12884343
 ] 

Prasad Chakka commented on HIVE-1293:
-

same as https://issues.apache.org/jira/browse/HIVE-829 ?



 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-06-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884111#action_12884111
 ] 

Namit Jain commented on HIVE-1293:
--

The unit tests are running right now  (they should succeed) - submitted a patch 
for review.


Also, all the jar files (3 of them) from hbase-handler/lib should be moved to 
lib.
That is not part of the patch since those files are binary

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-06-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884112#action_12884112
 ] 

He Yongqiang commented on HIVE-1293:


I will take a look.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-06-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884165#action_12884165
 ] 

He Yongqiang commented on HIVE-1293:


a few questions so far:
1) can the lock implementation guarantee the atomicity? I mean since the lock's 
logic happens in client side, it is possible that two concurrent client get 
conflicting locks.
2) about realizing locks. if a client did an unlock, will it also release locks 
made by other clients? I mean,if a client A did a lock, and then client B did 
another lock, and client B did an unlock, will client A still hold its lock?

still reading the code.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1293.1.patch


 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-05-05 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864541#action_12864541
 ] 

John Sichi commented on HIVE-1293:
--

Right, you get this if for partition p.q.r in t, you add the following to the 
flat lock list:

t (S)
t.p (S)
t.p.q (S)
t.p.q.r (S or X depending what the operation is)

This doesn't add a lot of extra locks in general since there are more children 
than parents, it makes the low-level recipe a little simpler, and maybe makes 
show locks output clearer.

It might be exactly what you are already proposing, in which case we're in 
agreement.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-05-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864554#action_12864554
 ] 

Namit Jain commented on HIVE-1293:
--

Agreed, this is cleaner than what I had. I was checking the parents, you are 
suggesting locking the parents in 'S' mode, which achieves the desired affect,
but removes the need for hierarchy from the lock manager.

It is even better given that we may have different lock manager 
implementations. I will update the wiki

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-05-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864145#action_12864145
 ] 

Namit Jain commented on HIVE-1293:
--

The initial writeup is at http://wiki.apache.org/hadoop/Hive/Locking.
Please comment


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-05-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863510#action_12863510
 ] 

Namit Jain commented on HIVE-1293:
--

One option is to use ZooKeeper for locking - we dont need to worry about leases 
since ZooKeeper supports ephemeral nodes.
The zookeeper quorum can be specified via some configuration parameters, and 
they need to be specified for concurrency to 
be enabled.

 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-04-14 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857110#action_12857110
 ] 

Ashish Thusoo commented on HIVE-1293:
-

I would vote for versioning. Since we do not have to deal with the complexity 
of a buffer cache I think this would be much simpler to implement than what is 
possible in traditional databases. At the same time, for locks we will have to 
do a lease based mechanism anyway in order to protect against locks leaking 
because of client crashes. And when you account for that, it seems that locking 
would not be significantly simpler to implement than versioning.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira