Re: Version in HBase

2014-11-11 Thread Krishna Kalyan
For Example for table 'test_table', Values inserted are:

Row1 - Val1 => t
Row1 - Val2 => t + 3
Row1 - Val3 => t + 5

Row2 - Val1 => t
Row2 - Val2 => t + 3
Row2 - Val3 => t + 5

on scan 'test_table' where version = t + 4 should return
Row1 - Val1 => t + 3
Row2 - Val2 => t + 3

How do i achieve time stamp based scans?.

Thanks and Regards,
Krishna




On Wed, Nov 12, 2014 at 10:56 AM, Krishna Kalyan 
wrote:

> Hi,
> Is it possible to do a
> select * from  where version = "somedate" ; using HBase APIs?.
> (Scanning for values where version <= "somedate" )
> Could you please direct me to appropriate links to achieve this?.
>
>
> Regards,
> Krishna
>
>
>


Version in HBase

2014-11-11 Thread Krishna Kalyan
Hi,
Is it possible to do a
select * from  where version = "somedate" ; using HBase APIs?.
(Scanning for values where version <= "somedate" )
Could you please direct me to appropriate links to achieve this?.


Regards,
Krishna


Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-11 Thread Liu, Ming (HPIT-GADSC)
Hi, all,

I am trying to use YCSB to test on our HBase 0.98.5 instance and got a strange 
result: update is 6x better than read. It is just an exercise, so the HBase is 
running in a workstation in standalone mode.
I modified the workloada shipped with YCSB into two new workloads: workloadr 
and workloadu, where workloadr is do 100% read operation and workloadu is do 
100% update operation. At the bottom is the workloadr and workloadu config 
files for your reference.

I found out that the read performance is much worse than the update 
performance, read is about 6000:

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p 
columnfamily=family -s -t
[OVERALL], RunTime(ms), 16565.0
[OVERALL], Throughput(ops/sec), 6036.824630244491

And the update performance is about 36000, 6x better than read.

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p 
columnfamily=family -s -t
[OVERALL], RunTime(ms), 2767.0
[OVERALL], Throughput(ops/sec), 36140.22406938923

Is this possible? IMHO, read should be faster than update.
Maybe I am wrong in the workload file? Or there is a possibility that update is 
faster than read? I don't find a YCSB mailing list, if anyone knows, please 
give me a link, so I can also ask question on that mailing list. But is it 
possible that put is faster than get in hbase? If not, the result must be wrong 
and I need to debug the YCSB code to figure out what is going wrong.

Workloadr:
recordcount=10
operationcount=10
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

workloadu:
recordcount=10
operationcount=10
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=1
scanproportion=0
insertproportion=0
requestdistribution=zipfian


Thanks,
Ming


RE: Themis : implements cross-row/corss-table transaction on HBase.

2014-11-11 Thread 崔建伟
Hi Stack:
  Thanks for your concern and work:). I will continue to improve Themis and 
report the new progress.

Best
jianwei

From: saint@gmail.com  on behalf of Stack 

Sent: Wednesday, November 12, 2014 12:15 AM
To: HBase Dev List
Cc: user@hbase.apache.org
Subject: Re: Themis : implements cross-row/corss-table transaction on HBase.

Thanks for updating the list with the nice Themis updates Jianwei. I added
Themis to the powered by list (and the other missing transactions managers,
Tephra and Haeinsa).
St.Ack

On Mon, Nov 10, 2014 at 12:50 AM, 崔建伟  wrote:

> Hi everyone:
> In last few months, we have updated Themis to achieve better performance
> and include more features:
>
> 1. Improve the single-row write performance from 23%(relative drop
> compared with HBase's put) to 60%(for most test cases). For single-row
> write transaction, we only write lock to MemStore in prewrite-phase, then,
> we erase corresponding lock, write data and commit information to HLog in
> commit-phase. This won't break the correctness of percolator algorithm and
> will help improve the performance a lot for single-row write.
>
> 2. Support HBase 0.98. We create a branch:
> https://github.com/XiaoMi/themis/tree/for_hbase_0.98 to make themis
> support HBase 0.98(Currently, support HBase 0.98.5). All the functions of
> master branch will also be implemented in this branch.
>
> 3. Transaction TTL support and Old Data Clean. Users could set TTL for
> read/write transaction respectively. Then, old data which could not be read
> will be cleaned periodically.
>
> 4. MapReduce Support. We implement a group of classes to support read data
> by themis transaction in Mapper job and write data by themis transaction in
> Reduce job.
>
> For more details, please see the github:
> https://github.com/XiaoMi/themis(or
> https://github.com/XiaoMi/themis/tree/for_hbase_0.98) or jira:
> https://issues.apache.org/jira/browse/HBASE-10999 . If you find Themis
> interesting, please leave us comment in the mail, jira or github.
>
> Best
> cuijianwei
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Sunday, July 13, 2014 1:12 PM
> To: HBase Dev List
> Cc: user@hbase.apache.org
> Subject: Re: Themis : implements cross-row/corss-table transaction on
> HBase.
>
> On Tue, Jul 8, 2014 at 12:34 AM, 崔建伟  wrote:
>
> > Hi everyone, I want to introduce our open-source project Themis which
> > implements cross-row/corss-table transaction on HBase.
> >
> > Themis follows google's percolator algorithm(
> > http://research.google.com/pubs/pub36726.html), which provides
> > ACID-compliant transaction and snapshot isolation. The cross-row
> > transaction is based on HBase's single-row atomic semantics and doesn't
> use
> > a central transaction server, so that supports linear-scalability.
> >
> > Themis depends on a timestamp server to provides global strictly
> > incremental timestamp to define the order of transactions, which will be
> > used to resolve the write-write and read-write conflicts. The timestamp
> > server is lightweight and could achieve hight throughput(500, 000 + qps),
> > and Themis will batch timestamp requests across transactions in one Rpc,
> so
> > that it won't become the bottleneck of the system even when processing
> > billions of transactions every day.
> >
> > Although Themis could be implemented totally in client-side, we adopt
> > coprocessor framework of HBase to achieve higher performance. Themis
> > includes a client-side library to provides transaction APIs, such as
> > themisPut/themisGet/themisScan/themisDelete, and a coprocessor library
> > loaded on regionserver. Therefore, Themis could be used without changing
> > the code and logic of HBase.
> >
> > We have been validating the correctness of Themis for a few months by a
> > AccountTransfer simulation program, which concurrently does cross-row
> > transactions by transferring money among different accounts(each account
> is
> > a row in HBase) and verifies total money of all accounts doesn't change
> in
> > the simulation. We have also run Themis on our production environment.
> >
> > We test the performance of Themis and get comparable result as
> percolator.
> > The single-column transaction represents the worst performance case for
> > Themis compared with HBase, the result is:
> > 1) For read, the performance of percolator is 90% of HBase;
> > 2) For write, the performance of percolator is 23% of HBase.
> > The write performance drops a lot because Themis uses two-phase commit
> > protocol to achieve ACID of transaction. For multi-row write, we improve
> > the performance by paralleling all writes of pre-write phase. For
> > single-row write, we are optimizing two-phase commit protocol to achieve
> > better performance and will update the result when it is ready. The
> details
> > of performance result could be found in github.
> >
> > The repository and introdu

Re: what can cause RegionTooBusyException?

2014-11-11 Thread Qiang Tian
or:
  LOG.warn("Region " + region.getRegionNameAsString() + " has too
many " +
"store files; delaying flush up to " + this.blockingWaitTime +
"ms");

sth like:

WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
occurrence,\x17\xF1o\x9C,1340981109494.ecb85155563c6614e5448c7d700b909e.
has too many store files; delaying flush up to 9ms



On Wed, Nov 12, 2014 at 10:26 AM, Qiang Tian  wrote:

> the checkResource Ted mentioned is a good suspect. see online hbase book
> "9.7.7.7.1.1. Being Stuck".
> Did you see below message in your RS log?
> LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime)
> +
>   "ms on a compaction to clean up 'too many store files'; waited "
> +
>   "long enough... proceeding with flush of " +
>   region.getRegionNameAsString());
>
>
> I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
> issuing a put in hbase shell will trigger flush and throw region too busy
> exception to client,  and the retry mechanism will make it done in next
> multi RPC call.
>
>
>
> On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
> brian.jelt...@digitalenvoy.net> wrote:
>
>> Thanks. I appear to have resolved this problem by restarting the HBase
>> Master and the RegionServers
>> that were reporting the failure.
>>
>> Brian
>>
>> On Nov 11, 2014, at 12:13 PM, Ted Yu  wrote:
>>
>> > For your first question, region server web UI,
>> > rs-status#regionRequestStats, shows Write Request Count.
>> >
>> > You can monitor the value for the underlying region to see if it
>> receives
>> > above-normal writes.
>> >
>> > Cheers
>> >
>> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema 
>> wrote:
>> >
>> >>> Was the region containing this row hot around the time of failure ?
>> >>
>> >> How do I measure that?
>> >>
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>
>> >> I didn't see anything in the region server logs to indicate a problem.
>> And
>> >> given the
>> >> reproducibility of the behavior, it's hard to see how dynamic
>> parameters
>> >> such as
>> >> memory pressure could be at the root of the problem.
>> >>
>> >> Brian
>> >>
>> >> On Nov 10, 2014, at 3:22 PM, Ted Yu  wrote:
>> >>
>> >>> Was the region containing this row hot around the time of failure ?
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> >> brian.jelt...@digitalenvoy.net> wrote:
>> >>>
>> > How many tasks may write to this row concurrently ?
>> 
>>  only 1 mapper should be writing to this row. Is there a way to check
>> >> which
>>  locks are being held?
>> 
>> > Which 0.98 release are you using ?
>> 
>>  0.98.0.2.1.2.1-471-hadoop2
>> 
>>  Thanks
>>  Brian
>> 
>>  On Nov 10, 2014, at 2:21 PM, Ted Yu  wrote:
>> 
>> > There could be more than one reason where RegionTooBusyException is
>> >> thrown.
>> > Below are two (from HRegion):
>> >
>> > * We throw RegionTooBusyException if above memstore limit
>> > * and expect client to retry using some kind of backoff
>> > */
>> > private void checkResources()
>> >
>> > * Try to acquire a lock.  Throw RegionTooBusyException
>> >
>> > * if failed to get the lock in time. Throw InterruptedIOException
>> >
>> > * if interrupted while waiting for the lock.
>> >
>> > */
>> >
>> > private void lock(final Lock lock, final int multiplier)
>> >
>> > How many tasks may write to this row concurrently ?
>> >
>> > Which 0.98 release are you using ?
>> >
>> > Cheers
>> >
>> > On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>> > brian.jelt...@digitalenvoy.net> wrote:
>> >
>> >> I’m running a map/reduce job against a table that is performing a
>> >> large
>> >> number of writes (probably updating every row).
>> >> The job is failing with the exception below. This is a solid
>> failure;
>> >> it
>> >> dies at the same point in the application,
>> >> and at the same row in the table. So I doubt it’s a conflict with
>> >> compaction (and the UI shows no compaction in progress),
>> >> or that there is a load-related cause.
>> >>
>> >> ‘hbase hbck’ does not report any inconsistencies. The
>> >> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>> >> there is operation in progress that is hung and blocking the
>> update. I
>> >> don’t see anything suspicious in the HBase logs.
>> >> The data at the point of failure is not unusual, and is identical
>> to
>> >> many
>> >> preceding rows.
>> >> Does anybody have any ideas of what I should look for to find the
>> >> cause of
>> >> this RegionTooBusyException?
>> >>
>> >> This is Hadoop 2.4 and HBase 0.98.
>> >>
>> >> 14/11/

Re: what can cause RegionTooBusyException?

2014-11-11 Thread Qiang Tian
the checkResource Ted mentioned is a good suspect. see online hbase book
"9.7.7.7.1.1. Being Stuck".
Did you see below message in your RS log?
LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) +
  "ms on a compaction to clean up 'too many store files'; waited " +
  "long enough... proceeding with flush of " +
  region.getRegionNameAsString());


I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
issuing a put in hbase shell will trigger flush and throw region too busy
exception to client,  and the retry mechanism will make it done in next
multi RPC call.



On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
brian.jelt...@digitalenvoy.net> wrote:

> Thanks. I appear to have resolved this problem by restarting the HBase
> Master and the RegionServers
> that were reporting the failure.
>
> Brian
>
> On Nov 11, 2014, at 12:13 PM, Ted Yu  wrote:
>
> > For your first question, region server web UI,
> > rs-status#regionRequestStats, shows Write Request Count.
> >
> > You can monitor the value for the underlying region to see if it receives
> > above-normal writes.
> >
> > Cheers
> >
> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema 
> wrote:
> >
> >>> Was the region containing this row hot around the time of failure ?
> >>
> >> How do I measure that?
> >>
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>
> >> I didn't see anything in the region server logs to indicate a problem.
> And
> >> given the
> >> reproducibility of the behavior, it's hard to see how dynamic parameters
> >> such as
> >> memory pressure could be at the root of the problem.
> >>
> >> Brian
> >>
> >> On Nov 10, 2014, at 3:22 PM, Ted Yu  wrote:
> >>
> >>> Was the region containing this row hot around the time of failure ?
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>>
> >>> Thanks
> >>>
> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
> >> brian.jelt...@digitalenvoy.net> wrote:
> >>>
> > How many tasks may write to this row concurrently ?
> 
>  only 1 mapper should be writing to this row. Is there a way to check
> >> which
>  locks are being held?
> 
> > Which 0.98 release are you using ?
> 
>  0.98.0.2.1.2.1-471-hadoop2
> 
>  Thanks
>  Brian
> 
>  On Nov 10, 2014, at 2:21 PM, Ted Yu  wrote:
> 
> > There could be more than one reason where RegionTooBusyException is
> >> thrown.
> > Below are two (from HRegion):
> >
> > * We throw RegionTooBusyException if above memstore limit
> > * and expect client to retry using some kind of backoff
> > */
> > private void checkResources()
> >
> > * Try to acquire a lock.  Throw RegionTooBusyException
> >
> > * if failed to get the lock in time. Throw InterruptedIOException
> >
> > * if interrupted while waiting for the lock.
> >
> > */
> >
> > private void lock(final Lock lock, final int multiplier)
> >
> > How many tasks may write to this row concurrently ?
> >
> > Which 0.98 release are you using ?
> >
> > Cheers
> >
> > On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> > brian.jelt...@digitalenvoy.net> wrote:
> >
> >> I’m running a map/reduce job against a table that is performing a
> >> large
> >> number of writes (probably updating every row).
> >> The job is failing with the exception below. This is a solid
> failure;
> >> it
> >> dies at the same point in the application,
> >> and at the same row in the table. So I doubt it’s a conflict with
> >> compaction (and the UI shows no compaction in progress),
> >> or that there is a load-related cause.
> >>
> >> ‘hbase hbck’ does not report any inconsistencies. The
> >> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
> >> there is operation in progress that is hung and blocking the
> update. I
> >> don’t see anything suspicious in the HBase logs.
> >> The data at the point of failure is not unusual, and is identical to
> >> many
> >> preceding rows.
> >> Does anybody have any ideas of what I should look for to find the
> >> cause of
> >> this RegionTooBusyException?
> >>
> >> This is Hadoop 2.4 and HBase 0.98.
> >>
> >> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
> >> attempt_1415210751318_0010_m_000314_1, Status : FAILED
> >> Error:
> >> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> Failed
> >> 1744 actions: RegionTooBusyException: 1744 times,
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess

Re: convert hbase Result into an easy-to-parse string

2014-11-11 Thread Ted Yu
Currently Result's toString() calls toString() method of each Cell
(KeyValue) it contains.
KeyValue's toString() method is implemented like this:

return keyToString(this.bytes, this.offset + ROW_OFFSET,
getKeyLength()) + "/vlen="

  + getValueLength() + "/seqid=" + seqId;

Meaning value is not part of the String.

FYI

On Tue, Nov 11, 2014 at 2:46 PM, Jian Feng 
wrote:

> I see the 'Result' class has a toString() method. what I need is a
> serialized form of the object, so that I can process the key values later
> on. The problem I am trying to solve is that I need to convert the object
> to a string then transfer this string to Python, and then let Python codes
> to process the hbase result.
>
> This will be used in py-spark program. but unfortunately I don't have much
> knowledge of java or even scala. I can write simple java/scala programs if
> you guys can give me some helps on how to do this.
>
> I like sth like,
>
> row-key: column-family1:column-qualifier1: value1;
> column-family2:column-qualifier2:values2;
>
> Thanks!


convert hbase Result into an easy-to-parse string

2014-11-11 Thread Jian Feng
I see the 'Result' class has a toString() method. what I need is a serialized 
form of the object, so that I can process the key values later on. The problem 
I am trying to solve is that I need to convert the object to a string then 
transfer this string to Python, and then let Python codes to process the hbase 
result. 

This will be used in py-spark program. but unfortunately I don't have much 
knowledge of java or even scala. I can write simple java/scala programs if you 
guys can give me some helps on how to do this. 

I like sth like, 

row-key: column-family1:column-qualifier1: value1; 
column-family2:column-qualifier2:values2;

Thanks!

Re: I'm studying hbase with php, and I wonder getRow guarantee sequential order.

2014-11-11 Thread greenblue
Thank you for the reply.

It helps me enough!



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/I-m-studying-hbase-with-php-and-I-wonder-getRow-guarantee-sequential-order-tp4065833p4065861.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: I'm studying hbase with php, and I wonder getRow guarantee sequential order.

2014-11-11 Thread Michael Segel
Not sure of the question. 
A scan will return multiple rows in sequential order. Note that its sequential 
byte stream order. 

The columns will also be in sequential order as well… 

So if you have a set of column named as ‘foo’+timestamp then for each column in 
the set of foo, it will be in order with the oldest data first. 
If you created a set of columns named as ‘bar’+(epoch - timestamp) then for 
each column in the set of bar, it will  be in order with the youngest data 
first. 

Note that all the columns in the set of bar+… will come before the columns in 
foo+… 

HTH


On Nov 10, 2014, at 7:36 PM, greenblue  wrote:

> When I call the function 'getRow', it returns array.
> But I couldn't find any documents about order of data sequence.
> 
> For instance,
> Presume that a column family is 'c' and qualifiers start from 'c:000' to
> 'c:100'.
> And when I call the function like below
> 
> $rowarr = getRow($table, $rowkey);
> 
> Does a result guarantee sequential order like below?
> Thank you in advance.
> 
> $rowarr[0] has always 'c:' and 'c:000' ~ 'c:100'
> $rowarr[1] has always 'c:101' ~ 'c:200'
> ...
> $rowarr[n] has always 'c:090' ~ 'c:100'
> 
> --
> HBase.php code
> --
> 
>  public function getRow($tableName, $row, $attributes)
>  {
>$this->send_getRow($tableName, $row, $attributes);
>return $this->recv_getRow();
>  }
> 
>  public function send_getRow($tableName, $row, $attributes)
>  {
>$args = new \Hbase\Hbase_getRow_args();
>$args->tableName = $tableName;
>$args->row = $row;
>$args->attributes = $attributes;
>$bin_accel = ($this->output_ instanceof TBinaryProtocolAccelerated) &&
> function_exists('thrift_protocol_write_binary');
>if ($bin_accel)
>{
>  thrift_protocol_write_binary($this->output_, 'getRow',
> TMessageType::CALL, $args, $this->seqid_, $this->output_->isStrictWrite());
>}
>else
>{
>  $this->output_->writeMessageBegin('getRow', TMessageType::CALL,
> $this->seqid_);
>  $args->write($this->output_);
>  $this->output_->writeMessageEnd();
>  $this->output_->getTransport()->flush();
>}
>  }
> 
>  public function recv_getRow()
>  {
>$bin_accel = ($this->input_ instanceof TBinaryProtocolAccelerated) &&
> function_exists('thrift_protocol_read_binary');
>if ($bin_accel) $result = thrift_protocol_read_binary($this->input_,
> '\Hbase\Hbase_getRow_result', $this->input_->isStrictRead());
>else
>{
>  $rseqid = 0;
>  $fname = null;
>  $mtype = 0;
> 
>  $this->input_->readMessageBegin($fname, $mtype, $rseqid);
>  if ($mtype == TMessageType::EXCEPTION) {
>$x = new TApplicationException();
>$x->read($this->input_);
>$this->input_->readMessageEnd();
>throw $x;
>  }
>  $result = new \Hbase\Hbase_getRow_result();
>  $result->read($this->input_);
>  $this->input_->readMessageEnd();
>}
>if ($result->success !== null) {
>  return $result->success;
>}
>if ($result->io !== null) {
>  throw $result->io;
>}
>throw new \Exception("getRow failed: unknown result");
>  }
> 
> 
> 
> --
> View this message in context: 
> http://apache-hbase.679495.n3.nabble.com/I-m-studying-hbase-with-php-and-I-wonder-getRow-guarantee-sequential-order-tp4065833.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 



Re: what can cause RegionTooBusyException?

2014-11-11 Thread Brian Jeltema
Thanks. I appear to have resolved this problem by restarting the HBase Master 
and the RegionServers
that were reporting the failure.

Brian

On Nov 11, 2014, at 12:13 PM, Ted Yu  wrote:

> For your first question, region server web UI,
> rs-status#regionRequestStats, shows Write Request Count.
> 
> You can monitor the value for the underlying region to see if it receives
> above-normal writes.
> 
> Cheers
> 
> On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema  wrote:
> 
>>> Was the region containing this row hot around the time of failure ?
>> 
>> How do I measure that?
>> 
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>> 
>> I didn't see anything in the region server logs to indicate a problem. And
>> given the
>> reproducibility of the behavior, it's hard to see how dynamic parameters
>> such as
>> memory pressure could be at the root of the problem.
>> 
>> Brian
>> 
>> On Nov 10, 2014, at 3:22 PM, Ted Yu  wrote:
>> 
>>> Was the region containing this row hot around the time of failure ?
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>>> 
>>> Thanks
>>> 
>>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> brian.jelt...@digitalenvoy.net> wrote:
>>> 
> How many tasks may write to this row concurrently ?
 
 only 1 mapper should be writing to this row. Is there a way to check
>> which
 locks are being held?
 
> Which 0.98 release are you using ?
 
 0.98.0.2.1.2.1-471-hadoop2
 
 Thanks
 Brian
 
 On Nov 10, 2014, at 2:21 PM, Ted Yu  wrote:
 
> There could be more than one reason where RegionTooBusyException is
>> thrown.
> Below are two (from HRegion):
> 
> * We throw RegionTooBusyException if above memstore limit
> * and expect client to retry using some kind of backoff
> */
> private void checkResources()
> 
> * Try to acquire a lock.  Throw RegionTooBusyException
> 
> * if failed to get the lock in time. Throw InterruptedIOException
> 
> * if interrupted while waiting for the lock.
> 
> */
> 
> private void lock(final Lock lock, final int multiplier)
> 
> How many tasks may write to this row concurrently ?
> 
> Which 0.98 release are you using ?
> 
> Cheers
> 
> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> brian.jelt...@digitalenvoy.net> wrote:
> 
>> I’m running a map/reduce job against a table that is performing a
>> large
>> number of writes (probably updating every row).
>> The job is failing with the exception below. This is a solid failure;
>> it
>> dies at the same point in the application,
>> and at the same row in the table. So I doubt it’s a conflict with
>> compaction (and the UI shows no compaction in progress),
>> or that there is a load-related cause.
>> 
>> ‘hbase hbck’ does not report any inconsistencies. The
>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>> there is operation in progress that is hung and blocking the update. I
>> don’t see anything suspicious in the HBase logs.
>> The data at the point of failure is not unusual, and is identical to
>> many
>> preceding rows.
>> Does anybody have any ideas of what I should look for to find the
>> cause of
>> this RegionTooBusyException?
>> 
>> This is Hadoop 2.4 and HBase 0.98.
>> 
>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>> Error:
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed
>> 1744 actions: RegionTooBusyException: 1744 times,
>> at
>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> at
>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> at
>> 
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>> at
>> 
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>> 
>> Brian
 
>>> 
>> 
>> 



Re: what can cause RegionTooBusyException?

2014-11-11 Thread Ted Yu
For your first question, region server web UI,
rs-status#regionRequestStats, shows Write Request Count.

You can monitor the value for the underlying region to see if it receives
above-normal writes.

Cheers

On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema  wrote:

> > Was the region containing this row hot around the time of failure ?
>
> How do I measure that?
>
> >
> > Can you check region server log (along with monitoring tool) what
> memstore pressure was ?
>
> I didn't see anything in the region server logs to indicate a problem. And
> given the
> reproducibility of the behavior, it's hard to see how dynamic parameters
> such as
> memory pressure could be at the root of the problem.
>
> Brian
>
> On Nov 10, 2014, at 3:22 PM, Ted Yu  wrote:
>
> > Was the region containing this row hot around the time of failure ?
> >
> > Can you check region server log (along with monitoring tool) what
> memstore pressure was ?
> >
> > Thanks
> >
> > On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
> brian.jelt...@digitalenvoy.net> wrote:
> >
> >>> How many tasks may write to this row concurrently ?
> >>
> >> only 1 mapper should be writing to this row. Is there a way to check
> which
> >> locks are being held?
> >>
> >>> Which 0.98 release are you using ?
> >>
> >> 0.98.0.2.1.2.1-471-hadoop2
> >>
> >> Thanks
> >> Brian
> >>
> >> On Nov 10, 2014, at 2:21 PM, Ted Yu  wrote:
> >>
> >>> There could be more than one reason where RegionTooBusyException is
> thrown.
> >>> Below are two (from HRegion):
> >>>
> >>> * We throw RegionTooBusyException if above memstore limit
> >>> * and expect client to retry using some kind of backoff
> >>> */
> >>> private void checkResources()
> >>>
> >>> * Try to acquire a lock.  Throw RegionTooBusyException
> >>>
> >>> * if failed to get the lock in time. Throw InterruptedIOException
> >>>
> >>> * if interrupted while waiting for the lock.
> >>>
> >>> */
> >>>
> >>> private void lock(final Lock lock, final int multiplier)
> >>>
> >>> How many tasks may write to this row concurrently ?
> >>>
> >>> Which 0.98 release are you using ?
> >>>
> >>> Cheers
> >>>
> >>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> >>> brian.jelt...@digitalenvoy.net> wrote:
> >>>
>  I’m running a map/reduce job against a table that is performing a
> large
>  number of writes (probably updating every row).
>  The job is failing with the exception below. This is a solid failure;
> it
>  dies at the same point in the application,
>  and at the same row in the table. So I doubt it’s a conflict with
>  compaction (and the UI shows no compaction in progress),
>  or that there is a load-related cause.
> 
>  ‘hbase hbck’ does not report any inconsistencies. The
>  ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>  there is operation in progress that is hung and blocking the update. I
>  don’t see anything suspicious in the HBase logs.
>  The data at the point of failure is not unusual, and is identical to
> many
>  preceding rows.
>  Does anybody have any ideas of what I should look for to find the
> cause of
>  this RegionTooBusyException?
> 
>  This is Hadoop 2.4 and HBase 0.98.
> 
>  14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>  attempt_1415210751318_0010_m_000314_1, Status : FAILED
>  Error:
>  org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
>  1744 actions: RegionTooBusyException: 1744 times,
>   at
> 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>   at
> 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>   at
> 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>   at
> 
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
> 
>  Brian
> >>
> >
>
>


Re: Themis : implements cross-row/corss-table transaction on HBase.

2014-11-11 Thread Stack
Thanks for updating the list with the nice Themis updates Jianwei. I added
Themis to the powered by list (and the other missing transactions managers,
Tephra and Haeinsa).
St.Ack

On Mon, Nov 10, 2014 at 12:50 AM, 崔建伟  wrote:

> Hi everyone:
> In last few months, we have updated Themis to achieve better performance
> and include more features:
>
> 1. Improve the single-row write performance from 23%(relative drop
> compared with HBase's put) to 60%(for most test cases). For single-row
> write transaction, we only write lock to MemStore in prewrite-phase, then,
> we erase corresponding lock, write data and commit information to HLog in
> commit-phase. This won't break the correctness of percolator algorithm and
> will help improve the performance a lot for single-row write.
>
> 2. Support HBase 0.98. We create a branch:
> https://github.com/XiaoMi/themis/tree/for_hbase_0.98 to make themis
> support HBase 0.98(Currently, support HBase 0.98.5). All the functions of
> master branch will also be implemented in this branch.
>
> 3. Transaction TTL support and Old Data Clean. Users could set TTL for
> read/write transaction respectively. Then, old data which could not be read
> will be cleaned periodically.
>
> 4. MapReduce Support. We implement a group of classes to support read data
> by themis transaction in Mapper job and write data by themis transaction in
> Reduce job.
>
> For more details, please see the github:
> https://github.com/XiaoMi/themis(or
> https://github.com/XiaoMi/themis/tree/for_hbase_0.98) or jira:
> https://issues.apache.org/jira/browse/HBASE-10999 . If you find Themis
> interesting, please leave us comment in the mail, jira or github.
>
> Best
> cuijianwei
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Sunday, July 13, 2014 1:12 PM
> To: HBase Dev List
> Cc: user@hbase.apache.org
> Subject: Re: Themis : implements cross-row/corss-table transaction on
> HBase.
>
> On Tue, Jul 8, 2014 at 12:34 AM, 崔建伟  wrote:
>
> > Hi everyone, I want to introduce our open-source project Themis which
> > implements cross-row/corss-table transaction on HBase.
> >
> > Themis follows google's percolator algorithm(
> > http://research.google.com/pubs/pub36726.html), which provides
> > ACID-compliant transaction and snapshot isolation. The cross-row
> > transaction is based on HBase's single-row atomic semantics and doesn't
> use
> > a central transaction server, so that supports linear-scalability.
> >
> > Themis depends on a timestamp server to provides global strictly
> > incremental timestamp to define the order of transactions, which will be
> > used to resolve the write-write and read-write conflicts. The timestamp
> > server is lightweight and could achieve hight throughput(500, 000 + qps),
> > and Themis will batch timestamp requests across transactions in one Rpc,
> so
> > that it won't become the bottleneck of the system even when processing
> > billions of transactions every day.
> >
> > Although Themis could be implemented totally in client-side, we adopt
> > coprocessor framework of HBase to achieve higher performance. Themis
> > includes a client-side library to provides transaction APIs, such as
> > themisPut/themisGet/themisScan/themisDelete, and a coprocessor library
> > loaded on regionserver. Therefore, Themis could be used without changing
> > the code and logic of HBase.
> >
> > We have been validating the correctness of Themis for a few months by a
> > AccountTransfer simulation program, which concurrently does cross-row
> > transactions by transferring money among different accounts(each account
> is
> > a row in HBase) and verifies total money of all accounts doesn't change
> in
> > the simulation. We have also run Themis on our production environment.
> >
> > We test the performance of Themis and get comparable result as
> percolator.
> > The single-column transaction represents the worst performance case for
> > Themis compared with HBase, the result is:
> > 1) For read, the performance of percolator is 90% of HBase;
> > 2) For write, the performance of percolator is 23% of HBase.
> > The write performance drops a lot because Themis uses two-phase commit
> > protocol to achieve ACID of transaction. For multi-row write, we improve
> > the performance by paralleling all writes of pre-write phase. For
> > single-row write, we are optimizing two-phase commit protocol to achieve
> > better performance and will update the result when it is ready. The
> details
> > of performance result could be found in github.
> >
> > The repository and introduction of Themis include:
> > 1. Themis github: https://github.com/XiaoMi/themis/. The source code,
> > performance test result and user guide could be found here.
> > 2. Themis jira : https://issues.apache.org/jira/browse/HBASE-10999
> > 3. Chronos github: https://github.com/XiaoMi/chronos. Chronos is our
> > open-source high-availability, high-performance timestamp server to
> p

Re: Timeseries Aggregation - TAggregator

2014-11-11 Thread Stack
nvm. I just added it myself. Let me know if you'd like me change
description.
St.Ack

On Tue, Nov 11, 2014 at 7:23 AM, Stack  wrote:

> Sweet. Thanks for posting notice here Julian. Add TAggregator here
> http://wiki.apache.org/hadoop/SupportingProjects ? (Make yourself a login
> and tell me what it is offlist and I'll give you edit rights).
>
> Thanks,
> St.Ack
>
>
> On Tue, Nov 11, 2014 at 5:10 AM, Julian Wissmann  > wrote:
>
>> Hi,
>>
>> I am pleased to announce the TAggregator [
>> https://github.com/juwi/HBase-TAggregator].
>> It is a coprocessor capable of returning an interval based map of
>> aggregates.
>> So far it supports max,min,avg and sum.
>> It can handle timestamps embedded in the key (as integers) or, as an
>> alternative timestamps using HBase's timestamp field.
>>
>> The code still has some rough edges that need cleaning, but overall it
>> shouldn't have any pitfalls.
>>
>> If this is something you're interested in, feel free to give it a try on
>> 0.98/0.99. If you want to contribute/have ideas for cool features, let me
>> know or just send a pull request.
>>
>> Regards
>> Julian
>>
>
>


Re: Timeseries Aggregation - TAggregator

2014-11-11 Thread Stack
Sweet. Thanks for posting notice here Julian. Add TAggregator here
http://wiki.apache.org/hadoop/SupportingProjects ? (Make yourself a login
and tell me what it is offlist and I'll give you edit rights).

Thanks,
St.Ack

On Tue, Nov 11, 2014 at 5:10 AM, Julian Wissmann 
wrote:

> Hi,
>
> I am pleased to announce the TAggregator [
> https://github.com/juwi/HBase-TAggregator].
> It is a coprocessor capable of returning an interval based map of
> aggregates.
> So far it supports max,min,avg and sum.
> It can handle timestamps embedded in the key (as integers) or, as an
> alternative timestamps using HBase's timestamp field.
>
> The code still has some rough edges that need cleaning, but overall it
> shouldn't have any pitfalls.
>
> If this is something you're interested in, feel free to give it a try on
> 0.98/0.99. If you want to contribute/have ideas for cool features, let me
> know or just send a pull request.
>
> Regards
> Julian
>


Timeseries Aggregation - TAggregator

2014-11-11 Thread Julian Wissmann
Hi,

I am pleased to announce the TAggregator [
https://github.com/juwi/HBase-TAggregator].
It is a coprocessor capable of returning an interval based map of
aggregates.
So far it supports max,min,avg and sum.
It can handle timestamps embedded in the key (as integers) or, as an
alternative timestamps using HBase's timestamp field.

The code still has some rough edges that need cleaning, but overall it
shouldn't have any pitfalls.

If this is something you're interested in, feel free to give it a try on
0.98/0.99. If you want to contribute/have ideas for cool features, let me
know or just send a pull request.

Regards
Julian