Re: ICV concurrency problem (?)

2010-06-12 Thread Ted Yu
For #3, take a look at http://en.wikipedia.org/wiki/Network_Time_Protocol

On Sat, Jun 12, 2010 at 2:22 PM, Mark Laffoon mlaff...@semanticresearch.com
 wrote:

 I'm not having a lot of success figuring out the pattern. I am most
 definitely not seeing stack traces in any of the logs. I'm not seeing any
 errors in my app logs, although I haven't scoured every log from every
 hadoop/mapreduce/hbase agent in the system (I really need to centralize
 those logs).

 However, I have an HBase question that might be related: how are
 timestamps handled/generated?

 1. I have multiple clients (map/reduce task executors) hitting an HBase
 cluster with multiple region servers. Assuming the client code doesn't
 explicitly set the timestamp, which box actually generates the timestamp
 for a put?

 2. If a timestamp for a put (either generated by the client or by whatever
 box) is older than the most recent, and I have maxVersions set to 1, does
 the put get ignored?

 3. If I have an HBase cluster, and the times of the various machines
 aren't in sync, am I just asking for trouble? What do most people do to
 keep their machines in sync?

 Thanks,
 Mark

 -Original Message-
 From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
 Sent: Saturday, June 12, 2010 9:59 AM
 To: user@hbase.apache.org
 Subject: Re: ICV concurrency problem (?)

 On Fri, Jun 11, 2010 at 4:26 PM, Mark Laffoon
 mlaff...@semanticresearch.com wrote:
  The other thing I didn't mention: I ran the 80x12 test a few more
  times. Sometimes it works, and sometimes it doesn't sigh. Could there
 be
  an issue with data being moved around regions?
 

 So, when it doesn't work, can you figure difference?  Are tasks
 failing?  Are there exceptions in the hbase/tasktracker logs?

 St.Ack



RE: ICV concurrency problem (?)

2010-06-12 Thread Andrew Purtell
 From: Mark Laffoon
 Subject: RE: ICV concurrency problem (?)

 1. I have multiple clients (map/reduce task executors)
 hitting an HBase cluster with multiple region servers.
 Assuming the client code doesn't explicitly set the
 timestamp, which box actually generates the timestamp
 for a put? 

The region server(s) servicing the put.

 2. If a timestamp for a put (either generated by the client
 or by whatever box) is older than the most recent, and I
 have maxVersions set to 1, does the put get ignored?

I can tell you what I think but there have been changes in this area since 
0.20.3 so I'm not sure any more. I'd like to hear an answer for this also (and 
need to go digging in the code).

 3. If I have an HBase cluster, and the times of the various
 machines aren't in sync, am I just asking for trouble?

Yes.

As with #1, the region server is setting the timestamp if the client is not. 

If a region migrates from one RS to another and they're out of sync, then it 
will be time traveling. 

If you delete something but your delete is in the past relative to timestamps 
on the puts, the delete will be ignored. 

That should paint a picture. 

 What do most people do to keep their machines in sync?

NTP

   - Andy