optimal size for Hbase.hregion.memstore.flush.size and its impact

2015-08-24 Thread Gautam Borah
Hi all,

The default size of Hbase.hregion.memstore.flush.size is define as 128 MB for 
Hbase.hregion.memstore.flush.size. Could anyone kindly explain what would be 
the impact if we increase this to a higher value 512 MB or 800 MB or higher.

We have a very write heavy cluster. Also we run periodic end point co processor 
based jobs that operate on the data written in the last 10-15 mins, every 10 
minute. We are trying to manage the memstore flush operations such that the hot 
data remains in memstore for at least 30-40 mins or longer, so that the job 
hits disk every 3rd or 4th time it tries to operate on the hot data (it does 
scan). 

We have region server heap size of 20 GB and set the,

hbase.regionserver.global.memstore.lowerLimit = .45
hbase.regionserver.global.memstore.upperLimit = .55

We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% 
of the heap is utilized by memstore, after that memstore flushes.

At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap 
utelization to by memstore to 35%. 

It would be very helpful for us to understand the implication of higher 
Hbase.hregion.memstore.flush.size  for a long running cluster. 

Thanks,
Gautam

How to scan only Memstore from end point co-processor

2015-06-01 Thread Gautam Borah
Hi all,

Here is our use case,

We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last 10-15
mins, every 10 minute.

Is there a way to only query in the MemStore from the end point
co-processor? The periodic job scans for data using a time range. We would
like to implement a simple logic,

a. if query time range is within MemStore's TimeRangeTracker, then query
only memstore.
b. If end Time of the query time range is within MemStore's
TimeRangeTracker, but query start Time is outside MemStore's
TimeRangeTracker (memstore flush happened), then query both MemStore and
Files.
c. If start time and end time of the query is outside of MemStore
TimeRangeTracker we query only files.

The incoming data is time series and we do not allow old data (out of sync
from clock) to come into the system(HBase).

Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
this available in Trunk?

Also, how do I access the Memstore for a Column Family in the end point
co-processor from CoprocessorEnvironment?


Re: How to scan only Memstore from end point co-processor

2015-06-01 Thread Gautam Borah
Thanks Vladimir. We will try this out soon.

Regards,
Gautam

On Mon, Jun 1, 2015 at 12:22 AM, Vladimir Rodionov vladrodio...@gmail.com
wrote:

 InternalScan has ctor from Scan object

 See https://issues.apache.org/jira/browse/HBASE-12720

 You can instantiate InternalScan from Scan, set checkOnlyMemStore, then
 open RegionScanner, but the best approach is
 to cache data on write and run regular RegionScanner from memstore and
 block cache.

 best,
 -Vlad




 On Sun, May 31, 2015 at 11:45 PM, Anoop John anoop.hb...@gmail.com
 wrote:

  If your scan is having a time range specified in it, HBase internally
 will
  check this against the time range of files etc and will avoid those which
  are clearly out of your interested time range.  You dont have to do any
  thing for this.  Make sure you set the TimeRange for ur read
 
  -Anoop-
 
  On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan 
  ramkrishna.s.vasude...@gmail.com wrote:
 
   We have a postScannerOpen hook in the CP but that may not give you a
  direct
   access to know which one are the internal scanners on the Memstore and
   which one are on the store files. But this is possible but we may need
 to
   add some new hooks at this place where we explicitly add the internal
   scanners required for a scan.
  
   But still a general question - are you sure that your data will be only
  in
   the memstore and that the latest data would not have been flushed by
 that
   time from your memstore to the Hfiles.  I see that your scenario is
 write
   centric and how can you guarentee your data can be in memstore only?
   Though your time range may say it is the latest data (may be 10 to 15
  min)
   but you should be able to configure your memstore flushing in such a
 way
   that there are no flushes happening for the latest data in that 10 to
 15
   min time.  Just saying my thoughts here.
  
  
  
  
   On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah gbo...@appdynamics.com
   wrote:
  
Hi all,
   
Here is our use case,
   
We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last
 10-15
mins, every 10 minute.
   
Is there a way to only query in the MemStore from the end point
co-processor? The periodic job scans for data using a time range. We
   would
like to implement a simple logic,
   
a. if query time range is within MemStore's TimeRangeTracker, then
  query
only memstore.
b. If end Time of the query time range is within MemStore's
TimeRangeTracker, but query start Time is outside MemStore's
TimeRangeTracker (memstore flush happened), then query both MemStore
  and
Files.
c. If start time and end time of the query is outside of MemStore
TimeRangeTracker we query only files.
   
The incoming data is time series and we do not allow old data (out of
   sync
from clock) to come into the system(HBase).
   
Cloudera has a scanner
  org.apache.hadoop.hbase.regionserver.InternalScan,
that has methods like checkOnlyMemStore() and checkOnlyStoreFiles().
 Is
this available in Trunk?
   
Also, how do I access the Memstore for a Column Family in the end
 point
co-processor from CoprocessorEnvironment?
   
  
 



impact of using higher Hbase.hregion.memstore.flush.size=512MB

2015-05-27 Thread Gautam Borah
Hi all,

The default size of Hbase.hregion.memstore.flush.size is define as 128 MB .
Could anyone kindly explain what would be the impact if we increase this to
a higher value 512 MB or 800 MB or higher.

We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last 10-15
mins, every 10 minute. We are trying to manage the memstore flush
operations such that the hot data remains in memstore for at least 30-40
mins or longer, so that the job hits disk every 3rd or 4th time it tries to
operate on the hot data (it does scan).

We have region server heap size of 20 GB and set the,

hbase.regionserver.global.memstore.lowerLimit = .45

hbase.regionserver.global.memstore.upperLimit = .55

We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only
10% of the heap is utilized by memstore, after that memstore flushes.

At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
heap utilization to by memstore to 35%.

It would be very helpful for us to understand the implication of higher
Hbase.hregion.memstore.flush.size  for a long running cluster.

Thanks,

Gautam


Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB

2015-05-27 Thread Gautam Borah
Hi Esteban,

Thanks for your response. hbase.rs.cacheblocksonwrite would be very useful
for us.

We have set hbase.regionserver.maxlogs appropriately to avoid flush across
memstores. Also set hbase.regionserver.optionalcacheflushinterval to 0 to
disable periodic flushing, we do not write anything by passing the WAL.

We are running the cluster with conservative limits, so that if a region
server crashes, others can take the extra load without hitting the memstore
flushing limits.

We are running the cluster now at 800MB flush size, initial job runs are
fine. We will run it for couple of days and check the status.

Thanks again.

Gautam




On Wed, May 27, 2015 at 2:15 PM, Esteban Gutierrez este...@cloudera.com
wrote:

 Gautam,

 Yes, you can increase the size of the memstore to values larger to 128MB
 but usually you go by increasing hbase.hregion.memstore.block.multiplier
 only. Depending on the version of HBase you are running many things can
 happen, e.g. multiple memstores can be flushed at once and/or the memstores
 will be flushed if there are some rows in memory (30 million) or if the
 store hasn't been flushed in an hour, the rate of the flushes can be tuned
 and also if you are hitting the max number of HLogs that can trigger a
 flush. One problem  running with large memstores is mostly how many regions
 you will have per RS and if using some encoding and/or compression codec is
 being used might cause the flush to take longer or use more CPU resources
 or push back clients b/c you haven't flushed some regions to disk.

 Based on the the behavior that you have described on the heap utilization
 sounds like you are not fully utilizing the memstores and you are below the
 lower limit, so depending on the version of HBase and available resources
 you might want to use hbase.rs.cacheblocksonwrite instead to keep some of
 the hot data in the block cache.

 cheers,
 esteban.




 --
 Cloudera, Inc.


 On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com
 wrote:

  Hi all,
 
  The default size of Hbase.hregion.memstore.flush.size is define as 128
 MB .
  Could anyone kindly explain what would be the impact if we increase this
 to
  a higher value 512 MB or 800 MB or higher.
 
  We have a very write heavy cluster. Also we run periodic end point co
  processor based jobs that operate on the data written in the last 10-15
  mins, every 10 minute. We are trying to manage the memstore flush
  operations such that the hot data remains in memstore for at least 30-40
  mins or longer, so that the job hits disk every 3rd or 4th time it tries
 to
  operate on the hot data (it does scan).
 
  We have region server heap size of 20 GB and set the,
 
  hbase.regionserver.global.memstore.lowerLimit = .45
 
  hbase.regionserver.global.memstore.upperLimit = .55
 
  We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
 only
  10% of the heap is utilized by memstore, after that memstore flushes.
 
  At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
  heap utilization to by memstore to 35%.
 
  It would be very helpful for us to understand the implication of higher
  Hbase.hregion.memstore.flush.size  for a long running cluster.
 
  Thanks,
 
  Gautam
 



Re: Hbase row ingestion ..

2015-04-30 Thread Gautam
Thanks Guys for responding!

Michael,
   I indeed should have elaborated on our current rowkey design. Re:
hotspotting, We'r doing exactly what you'r suggesting, i.e. fanning out
into buckets where the bucket location is a hash(message_unique_fields)
 (we use murmur3). So our write pattern is extremely even on the regions
and region-servers. We also pre-split our table into 480 buckets (that
number is based on our experience with the rate of change of cluster size).
So no complaints on the relative load on regions. We'v designed the rowkey
as per our usecase and are pretty happy with it. I'm happy to keep the
rowkey size the way it is but was concerned that we redundantly write that
very rowkey for each column (which isn't really needed). This column
qualifier optimization is over and above what we'r already doing to scale
on writes.  And was wondering if that could get use improvements on write
times. But I could be wrong if that cost, of repeating rowkey for each
cell, is purely incurred on the RS side and doesn't affect the write call
directly.

Lemme also point out we'r on Hbase 0.98.6 currently.


James,
That talk is awesome sauce! Especially the way you guys
analyzed your design with that lovely visualization. Any chance that's on a
github repo :-) ? Would be extremely useful for folks like us. Rowkey
design has been the center of our attention for weeks/months on end and a
quicker feedback loop like this viz would really speed up that process.


Thanks again guys. All of this helps.

-Gautam.



On Thu, Apr 30, 2015 at 7:35 AM, James Estes james.es...@gmail.com wrote:

 Guatam,

 Michael makes a lot of good points. Especially the importance of analyzing
 your use case for determining the row key design. We (Jive) did a talk at
 HBasecon a couple years back talking about our row key redesign to vastly
 improve performance. It also talks a little about the write path and has a
 (crude) visualization of the impact of the old and new row key designs.
 Your use case is likely different than ours was, but it may be helpful to
 hear our experience with row key design
 http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-real-performance-gains-with-real-time-data.html

 James

 On Apr 30, 2015, at 7:51 AM, Michael Segel michael_se...@hotmail.com
 wrote:

  I wouldn’t call storing attributes in separate columns a ‘rigid schema’.
 
  You are correct that you could write your data as a CLOB/BLOB and store
 it in a single cell.
  The upside is that its more efficient.
  The downside is that its really an all or nothing fetch and then you
 need to write the extra code to pull data from the Avro CLOB.  (Which does
 fit your use case.)
  This is a normal pattern and gives HBase an extra dimension of storage.
 
  With respect to the row key… look at your main use case.
  The size of the row key may be a necessary evil in terms of getting the
 unique document. (clob/blob).
 
  In terms of performance gains… you need to look at it this way… the cost
 of inserting a row is what it is.
 
  There will always be a cost for insertion.
  There will always be a minimum rowkey size required by your use case.
 
  The next issue is if you are ‘hot spotting’.  Note that I’m not talking
 about the initial start of loading in to a table, but if all of your data
 is going to the last region written because the rowkey is sequential.
  Here, you may look at hashing the rowkey (SHA-1 or SHA-2) which may
 shrink your row key (depending on your current rowkey length). The downside
 here is that you will lose your ability to perform range scans. So if your
 access pattern is get() rather than scan(), this will work.  Note too that
 I recommended SHA-1 or SHA-2 for the hash. MD5 works, and is faster, but
 there’s a greater chance of a hash collision. SHA-1 has a mathematical
 chance of a collision depending on data set, but I’ve never heard of anyone
 finding a collision. SHA-2 doesn’t have that problem, but I don’t know if
 its part of the core java packages.
 
  Again here, the upside is that you’re going to get a fairly even
 distribution across your cluster. (Which you didn’t describe. That too
 could be a factor in performance.)
 
  HTH
 
  On Apr 29, 2015, at 8:03 PM, Gautam gautamkows...@gmail.com wrote:
 
  Thanks for the quick response!
 
  Our read path is fairly straightforward and very deterministic. We
 always
  push down predicates at the rowkey level and read the row's full
 payload (
  never do projection/filtering over CQs ).  So.. I could, in theory,
 expect
  a gain as much as the current overhead of  [ 40 * sizeof(rowkey) ] ?
  Curious to understand more about how much of that overhead is actually
  incurred over the network and how much on the RS side. At least to the
  extent it affects the put() / flush()  calls. Lemme know if there are
  particular parts of the code or documentation I should be looking at for
  this. Would like to learn about the memory/netwokr

Re: Hbase row ingestion ..

2015-04-29 Thread Gautam
.. I'd like to add that we have a very fat rowkey.

- Thanks.

On Wed, Apr 29, 2015 at 5:30 PM, Gautam gautamkows...@gmail.com wrote:

 Hello,
We'v been fighting some ingestion perf issues on hbase and I have
 been looking at the write path in particular. Trying to optimize on write
 path currently.

 We have around 40 column qualifiers (under single CF) for each row. So I
 understand that each put(row) written into hbase would translate into 40
 (rowkey, cq, ts)  cells in Hbase.  If I switched to an Avro object based
 schema instead there would be a single (rowkey, avro_cq, ts) cell per row (
 all fields shoved into a single Avro blob).  Question is, would this
 approach really translate into any write-path perf benefits?

 Cheers,
 -Gautam.






-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Hbase row ingestion ..

2015-04-29 Thread Gautam
Hello,
   We'v been fighting some ingestion perf issues on hbase and I have
been looking at the write path in particular. Trying to optimize on write
path currently.

We have around 40 column qualifiers (under single CF) for each row. So I
understand that each put(row) written into hbase would translate into 40
(rowkey, cq, ts)  cells in Hbase.  If I switched to an Avro object based
schema instead there would be a single (rowkey, avro_cq, ts) cell per row (
all fields shoved into a single Avro blob).  Question is, would this
approach really translate into any write-path perf benefits?

Cheers,
-Gautam.


Re: Increasing write throughput..

2014-11-05 Thread Gautam
Thanks Anoop, Ted for the replies. This helped me understand Hbase's
write path a lot more.

After going through the literature and your comments on what triggers
memstore flushes,

Did the following :

 - Added 4 nodes ( all 8+4 = 12 RSs have 48000M heap each)
 - changed hbase.regionserver.maxlogs  = 150 (default 32)
 - hbase.hregion.memstore.flush.size = 536870912 ( as before )
 - hbase.hstore.blockingStoreFiles = 120
 - merged tiny/empty regions and brought down regions to 30% for this
table ( before: 1646 , after merge: ~600 )
 - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) , RS
MAX_HEAP_SIZE = 48000M


Snapshot of the HMaster RSs with req per sec [1]. Snapshot of hbase
tables: [2]. The region count per RS is now around 100 (evenly
distributed) and so are the requests per sec.

Based on the memstore size math, the flush size should now be =
48000*0.4/100 = 192M ? I still consistently see the memstore flushes
at ~128M.. it barely ever goes above that number. Also uploaded last
1000 lines of RS log after above settings + restart [3]

Here's the verbatim hbase-site.xml [4]


Cheers,
-Gautam.


[1] - postimg.org/image/t2cxb18sh
[2] - postimg.org/image/v3zaz9571
[3] - pastebin.com/HXK4s8zR
[4] - pastebin.com/av9XxecY



On Sun, Nov 2, 2014 at 5:46 PM, Anoop John anoop.hb...@gmail.com wrote:
 You have ~280 regions per RS.
 And ur memstore size % is 40% and heap size 48GB
 This mean the heap size for memstore is 48 * 0.4 = 19.2GB  ( I am just
 considering the upper water mark alone)

 If u have to consider all 280 regions each with 512 MB heap you need much
 more size of heap.   And your writes are distributed to all regions right?

 So you will be seeing flushes because of global heap pressure.

 Increasing the xmx and flush size alone wont help.  You need to consider
 the regions# and writes

 When you tune this the next step will be to tune the HLog and its rolling.
 That depends on your cell size as well.
 By default when we reach 95% size of HDFS block size, we roll to a new HLog
 file. And by default when we reach 32 Log files, we force flushes.  FYI.

 -Anoop-


 On Sat, Nov 1, 2014 at 10:54 PM, Ted Yu yuzhih...@gmail.com wrote:

 Please read 9.7.7.2. MemStoreFlush under
 http://hbase.apache.org/book.html#regions.arch

 Cheers

 On Fri, Oct 31, 2014 at 11:16 AM, Gautam Kowshik gautamkows...@gmail.com
 wrote:

  - Sorry bout the raw image upload, here’s the tsdb snapshot :
  http://postimg.org/image/gq4nf96x9/
  - Hbase version 98.1 (CDH 5.1 distro)
  - hbase-site pastebin : http://pastebin.com/fEctQ3im
  - this table ‘msg' has been pre-split with 240 regions and writes are
  evenly distributed into 240 buckets. ( the bucket is a prefix to the row
  key ) . These regions are well spread across the 8 RSs. Although over
 time
  these 240 have split and now become 2440 .. each region server has ~280
  regions.
  - last 500 lines of log from one RS : http://pastebin.com/8MwYMZPb Al
  - no hot regions from what i can tell.
 
  One of my main concerns was why even after setting the memstore flush
 size
  to 512M is it still flushing at 128M. Is there a setting i’v missed ? I’l
  try to get more details as i find them.
 
  Thanks and Cheers,
  -Gautam.
 
  On Oct 31, 2014, at 10:47 AM, Stack st...@duboce.net wrote:
 
   What version of hbase (later versions have improvements in write
   throughput, especially when many writing threads).  Post a pastebin of
   regionserver log in steadystate if you don't mind.  About how many
  writers
   going into server at a time?  How many regions on server.  All being
   written to at same rate or you have hotties?
   Thanks,
   St.Ack
  
   On Fri, Oct 31, 2014 at 10:22 AM, Gautam gautamkows...@gmail.com
  wrote:
  
   I'm trying to increase write throughput of our hbase cluster. we'r
   currently doing around 7500 messages per sec per node. I think we have
  room
   for improvement. Especially since the heap is under utilized and
  memstore
   size doesn't seem to fluctuate much between regular and peak ingestion
   loads.
  
   We mainly have one large table that we write most of the data to.
 Other
   tables are mainly opentsdb and some relatively small summary tables.
  This
   table is read in batch once a day but otherwise is mostly serving
 writes
   99% of the time. This large table has 1 CF and get's flushed at around
   ~128M fairly regularly like below..
  
   {log}
  
   2014-10-31 16:56:09,499 INFO
  org.apache.hadoop.hbase.regionserver.HRegion:
   Finished memstore flush of ~128.2 M/134459888, currentsize=879.5
  K/900640
   for region
  
 
 msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

Increasing write throughput..

2014-10-31 Thread Gautam
I'm trying to increase write throughput of our hbase cluster. we'r
currently doing around 7500 messages per sec per node. I think we have room
for improvement. Especially since the heap is under utilized and memstore
size doesn't seem to fluctuate much between regular and peak ingestion
loads.

We mainly have one large table that we write most of the data to. Other
tables are mainly opentsdb and some relatively small summary tables. This
table is read in batch once a day but otherwise is mostly serving writes
99% of the time. This large table has 1 CF and get's flushed at around
~128M fairly regularly like below..

{log}

2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640
for region
msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0002e5a329d2171149bcc1e83ed129312b\x00\x00\x00\x00,1413909604591.828e03c0475b699278256d4b5b9638a2.
in 640ms, sequenceid=16861176169, compaction requested=true

{log}

Here's a pastebin of my hbase site : http://pastebin.com/fEctQ3im

What i'v tried..
-  turned of major compactions , and handling these manually.
-  bumped up heap Xmx from 24G to 48 G
-  hbase.hregion.memstore.flush.size = 512M
- lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) since the
global heap has enough space to accommodate the default percentages.
 - Currently running Hbase 98.1 on an 8 node cluster that's scaled up to
128GB RAM.


There hasn't been any appreciable increase in write perf. Still hovering
around the 7500 per node write throughput number. The flushes still seem to
be hapenning at 128M (instead of the expected 512)

I'v attached a snapshot of the memstore size vs. flushQueueLen. the block
caches are utilizing the extra heap space but not the memstore. The flush
Queue lengths have increased which leads me to believe that it's flushing
way too often without any increase in throughput.

Please let me know where i should dig further. That's a long email, thanks
for reading through :-)



Cheers,
-Gautam.


Re: Increasing write throughput..

2014-10-31 Thread Gautam Kowshik
- Sorry bout the raw image upload, here’s the tsdb snapshot : 
http://postimg.org/image/gq4nf96x9/
- Hbase version 98.1 (CDH 5.1 distro)
- hbase-site pastebin : http://pastebin.com/fEctQ3im
- this table ‘msg' has been pre-split with 240 regions and writes are evenly 
distributed into 240 buckets. ( the bucket is a prefix to the row key ) . These 
regions are well spread across the 8 RSs. Although over time these 240 have 
split and now become 2440 .. each region server has ~280 regions.
- last 500 lines of log from one RS : http://pastebin.com/8MwYMZPb Al
- no hot regions from what i can tell. 

One of my main concerns was why even after setting the memstore flush size to 
512M is it still flushing at 128M. Is there a setting i’v missed ? I’l try to 
get more details as i find them.

Thanks and Cheers,
-Gautam.

On Oct 31, 2014, at 10:47 AM, Stack st...@duboce.net wrote:

 What version of hbase (later versions have improvements in write
 throughput, especially when many writing threads).  Post a pastebin of
 regionserver log in steadystate if you don't mind.  About how many writers
 going into server at a time?  How many regions on server.  All being
 written to at same rate or you have hotties?
 Thanks,
 St.Ack
 
 On Fri, Oct 31, 2014 at 10:22 AM, Gautam gautamkows...@gmail.com wrote:
 
 I'm trying to increase write throughput of our hbase cluster. we'r
 currently doing around 7500 messages per sec per node. I think we have room
 for improvement. Especially since the heap is under utilized and memstore
 size doesn't seem to fluctuate much between regular and peak ingestion
 loads.
 
 We mainly have one large table that we write most of the data to. Other
 tables are mainly opentsdb and some relatively small summary tables. This
 table is read in batch once a day but otherwise is mostly serving writes
 99% of the time. This large table has 1 CF and get's flushed at around
 ~128M fairly regularly like below..
 
 {log}
 
 2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640
 for region
 msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0002e5a329d2171149bcc1e83ed129312b\x00\x00\x00\x00,1413909604591.828e03c0475b699278256d4b5b9638a2.
 in 640ms, sequenceid=16861176169, compaction requested=true
 
 {log}
 
 Here's a pastebin of my hbase site : http://pastebin.com/fEctQ3im
 
 What i'v tried..
 -  turned of major compactions , and handling these manually.
 -  bumped up heap Xmx from 24G to 48 G
 -  hbase.hregion.memstore.flush.size = 512M
 - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) since the
 global heap has enough space to accommodate the default percentages.
 - Currently running Hbase 98.1 on an 8 node cluster that's scaled up to
 128GB RAM.
 
 
 There hasn't been any appreciable increase in write perf. Still hovering
 around the 7500 per node write throughput number. The flushes still seem to
 be hapenning at 128M (instead of the expected 512)
 
 I'v attached a snapshot of the memstore size vs. flushQueueLen. the block
 caches are utilizing the extra heap space but not the memstore. The flush
 Queue lengths have increased which leads me to believe that it's flushing
 way too often without any increase in throughput.
 
 Please let me know where i should dig further. That's a long email, thanks
 for reading through :-)
 
 
 
 Cheers,
 -Gautam.
 



Re: Copying data from 94 to 98 ..

2014-09-16 Thread Gautam
Jerry,
  Can you elaborate on what you mean by export table to hdfs?  I
initially tried running the export on src cluster (-copy-to
hdfs://dest/hbase ), it complains while trying to write the data to dest
cluster (due to the hdfs protocol version mismatch). Then I tried running
export on dest cluster (-copy-from hftp://src/hbase).

On Mon, Sep 15, 2014 at 10:36 PM, Jerry He jerry...@gmail.com wrote:

 While you continue on the snapshot approach, have you tried to Export the
 table in 0.94 to hdfs, and then Import the data from hdfs to 0.98?
 On Sep 15, 2014 10:19 PM, Matteo Bertozzi theo.berto...@gmail.com
 wrote:

  can you post the full exception and the file path ?
  maybe there is a bug in looking up the reference file.
  It seems to not be able to find enough data in the file...
 
  Matteo
 
 
  On Mon, Sep 15, 2014 at 10:08 PM, Gautam gautamkows...@gmail.com
 wrote:
 
   Thanks for the reply Matteo.
  
   This is exactly what I did. I modified the source cluster's dir
 structure
   to mimic that of the 98 cluster. I even got as far as it trying to look
   through the reference files.
  
   I end up with this exception :
  
   14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed
   java.io.IOException
   at
  
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
   at
  
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
   at
  
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
   at
  
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
   ...
   ..
   Caused by: java.io.IOException: read=-1, wanted=4
   at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java
   ..
  
  
   This and Ted's reply about HBASE-7987 leads me to believe that the
 export
   tool from my distro is in capable of working around the regionManifest
  file
   requirement. I'm now left with the option of downgrading my dest
 cluster
  to
   94, copying data and then upgrading using the upgrade migration tool.
   Wanted to know if others have tried this or there are other things I
 can
   do. If not, i'l just go ahead and do this :-)
  
  
   Cheers,
   -Gautam.
  
  
  
   On Mon, Sep 15, 2014 at 8:10 PM, Matteo Bertozzi 
  theo.berto...@gmail.com
   wrote:
  
94 and 98 differs in directory layout
so 98 is not able to read 94 layout unless you run the migration tool
which is basically moving all the data in a default namespace
  directory
e.g.
/hbase/table - /hbase/data/default/table
/hbase/.archive/table - /hbase/archive/default/table
   
Matteo
   
   
On Mon, Sep 15, 2014 at 6:17 PM, Gautam gautamkows...@gmail.com
  wrote:
   
 Yep, looks like the CDH distro backports HBASE-7987. Having said
  that,
   is
 there a transition path for us or are we hosed :-) ? In general,
  what's
the
 recommended way to achieve this, at this point I feel i'm going
  around
the
 system to achieve what I want. If nothing else works with export
   snapshot
 should I just downgrade to 94, export snapshot and then upgrade to
  98?
   Is
 the upgrade migration path different from what export snapshot does
   (i'd
 imagine yes)?

 Cheers,
 -Gautam.




 On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com
 wrote:

  bq. 98.1 on dest cluster
 
  Looking at the history for SnapshotManifestV1, it came with
   HBASE-7987
  which went to 0.99.0
 
  Perhaps you're using a distro with HBASE-7987 ?
 
  On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com
 
wrote:
 
   Hello,
 I'm trying to copy data between Hbase clusters on
   different
   versions. I am using :
  
   /usr/bin/hbase  org.apache.hadoop.hbase.snapshot.ExportSnapshot
 -chuser hbase
 -chgroup hadoop
 -snapshot msg_snapshot
 -mappers 50
 -copy-from hftp://src-cluster:50070/hbase
 -copy-to hdfs:/dest-cluster:8020/hbase
  
  
   Till now, based on various tips from the mailing list, I have
modified
  the
   source cluster data dir paths to mimic the 98 convention
  (archive,
 table
   data paths, etc). This helped in jumping some roadblocks but
 not
   all.
  
   This is what I see now :
  
  
   14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot
 export
failed
   java.io.IOException
   at
  
  
 

   
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145

Re: Copying data from 94 to 98 ..

2014-09-15 Thread Gautam
Yep, looks like the CDH distro backports HBASE-7987. Having said that, is
there a transition path for us or are we hosed :-) ? In general, what's the
recommended way to achieve this, at this point I feel i'm going around the
system to achieve what I want. If nothing else works with export snapshot
should I just downgrade to 94, export snapshot and then upgrade to 98? Is
the upgrade migration path different from what export snapshot does (i'd
imagine yes)?

Cheers,
-Gautam.




On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. 98.1 on dest cluster

 Looking at the history for SnapshotManifestV1, it came with HBASE-7987
 which went to 0.99.0

 Perhaps you're using a distro with HBASE-7987 ?

 On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com wrote:

  Hello,
I'm trying to copy data between Hbase clusters on different
  versions. I am using :
 
  /usr/bin/hbase  org.apache.hadoop.hbase.snapshot.ExportSnapshot
-chuser hbase
-chgroup hadoop
-snapshot msg_snapshot
-mappers 50
-copy-from hftp://src-cluster:50070/hbase
-copy-to hdfs:/dest-cluster:8020/hbase
 
 
  Till now, based on various tips from the mailing list, I have modified
 the
  source cluster data dir paths to mimic the 98 convention (archive, table
  data paths, etc). This helped in jumping some roadblocks but not all.
 
  This is what I see now :
 
 
  14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed
  java.io.IOException
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
  ...
  ..
  Caused by: java.io.IOException: read=-1, wanted=4
  at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
  at
 
 
 org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java
  ..
 
 
  Fails while trying to read refernce hfile. Is this something folks have
  done before and/or is possible to do? I'd really like to do this without
  having to upgrade my source cluster or downgrade my dest cluster.
 
  I'm using 94.6 on source cluster and 98.1 on dest cluster.
 
 
  Cheers,
  -Gautam.
 




-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Re: Copying data from 94 to 98 ..

2014-09-15 Thread Gautam
Thanks for the reply Matteo.

This is exactly what I did. I modified the source cluster's dir structure
to mimic that of the 98 cluster. I even got as far as it trying to look
through the reference files.

I end up with this exception :

14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed
java.io.IOException
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
...
..
Caused by: java.io.IOException: read=-1, wanted=4
at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java
..


This and Ted's reply about HBASE-7987 leads me to believe that the export
tool from my distro is in capable of working around the regionManifest file
requirement. I'm now left with the option of downgrading my dest cluster to
94, copying data and then upgrading using the upgrade migration tool.
Wanted to know if others have tried this or there are other things I can
do. If not, i'l just go ahead and do this :-)


Cheers,
-Gautam.



On Mon, Sep 15, 2014 at 8:10 PM, Matteo Bertozzi theo.berto...@gmail.com
wrote:

 94 and 98 differs in directory layout
 so 98 is not able to read 94 layout unless you run the migration tool
 which is basically moving all the data in a default namespace directory
 e.g.
 /hbase/table - /hbase/data/default/table
 /hbase/.archive/table - /hbase/archive/default/table

 Matteo


 On Mon, Sep 15, 2014 at 6:17 PM, Gautam gautamkows...@gmail.com wrote:

  Yep, looks like the CDH distro backports HBASE-7987. Having said that, is
  there a transition path for us or are we hosed :-) ? In general, what's
 the
  recommended way to achieve this, at this point I feel i'm going around
 the
  system to achieve what I want. If nothing else works with export snapshot
  should I just downgrade to 94, export snapshot and then upgrade to 98? Is
  the upgrade migration path different from what export snapshot does (i'd
  imagine yes)?
 
  Cheers,
  -Gautam.
 
 
 
 
  On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. 98.1 on dest cluster
  
   Looking at the history for SnapshotManifestV1, it came with HBASE-7987
   which went to 0.99.0
  
   Perhaps you're using a distro with HBASE-7987 ?
  
   On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com
 wrote:
  
Hello,
  I'm trying to copy data between Hbase clusters on different
versions. I am using :
   
/usr/bin/hbase  org.apache.hadoop.hbase.snapshot.ExportSnapshot
  -chuser hbase
  -chgroup hadoop
  -snapshot msg_snapshot
  -mappers 50
  -copy-from hftp://src-cluster:50070/hbase
  -copy-to hdfs:/dest-cluster:8020/hbase
   
   
Till now, based on various tips from the mailing list, I have
 modified
   the
source cluster data dir paths to mimic the 98 convention (archive,
  table
data paths, etc). This helped in jumping some roadblocks but not all.
   
This is what I see now :
   
   
14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export
 failed
java.io.IOException
at
   
   
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
at
   
   
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
at
   
   
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
at
   
   
  
 
 org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
...
..
Caused by: java.io.IOException: read=-1, wanted=4
at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java
..
   
   
Fails while trying to read refernce hfile. Is this something folks
 have
done before and/or is possible to do? I'd really like to do this
  without
having to upgrade my source cluster or downgrade my dest cluster.
   
I'm using 94.6 on source cluster and 98.1 on dest cluster.
   
   
Cheers,
-Gautam.
   
  
 
 
 
  --
  If you really want something in this life, you have to work for it. Now,
  quiet! They're about to announce the lottery numbers...
 




-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Re: Copying data from 94 to 98 ..

2014-09-15 Thread Gautam
14/09/15 23:34:59 DEBUG snapshot.SnapshotManifestV1: Adding reference for
file (4/4): hftp://
master42.stg.com:50070/hbase/.hbase-snapshot/msg_snapshot/84f60fc2aa7e96df91e6289e6c19dc25/c/afe341e4149649578c5861e32494dbec

14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed
java.io.IOException
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:108)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:479)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.access$200(ExportSnapshot.java:89)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportSnapshotInputFormat.getSplits(ExportSnapshot.java:600)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1107)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:751)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:905)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:975)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:979)
Caused by: java.io.IOException: read=-1, wanted=4
at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java:204)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.buildManifestFromDisk(SnapshotManifestV1.java:179)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV1$1.call(SnapshotManifestV1.java:131)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV1$1.call(SnapshotManifestV1.java:127)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

On Mon, Sep 15, 2014 at 10:18 PM, Matteo Bertozzi theo.berto...@gmail.com
wrote:

 can you post the full exception and the file path ?
 maybe there is a bug in looking up the reference file.
 It seems to not be able to find enough data in the file...

 Matteo


 On Mon, Sep 15, 2014 at 10:08 PM, Gautam gautamkows...@gmail.com wrote:

  Thanks for the reply Matteo.
 
  This is exactly what I did. I modified the source cluster's dir structure
  to mimic that of the 98 cluster. I even got as far as it trying to look
  through the reference files.
 
  I end up with this exception :
 
  14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed
  java.io.IOException
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119)
  at
 
 
 org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125)
  ...
  ..
  Caused by: java.io.IOException: read=-1, wanted=4
  at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175)
  at
 
 
 org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java
  ..
 
 
  This and Ted's reply about HBASE-7987 leads me to believe that the export
  tool from my distro is in capable of working around the regionManifest
 file
  requirement. I'm now left with the option of downgrading my dest cluster
 to
  94, copying data and then upgrading using the upgrade migration tool.
  Wanted to know if others have

Hbase Scan/Snapshot Performance...

2014-08-12 Thread Gautam
Hello,

 We'v been using and loving Hbase for couple of months now. Our primary
usecase for Hbase is writing events in stream to an online time series
Hbase table. Every so often we run medium to large batch scan MR jobs on
sections (1hour, 1 day, 1 week)  of this same time series table. This
online table is now showing spikes whenever these large batched read jobs
are run. Write throughput goes down while these sequential scans are
running on the table.

We'v been playing around with snapshots and are considering using snapshots
to take over the responsibility for running these scheduled hourly, daily,
weekly jobs so that the online table doesn't get affected. From preliminary
tests it looks like online snapshots take waay too long. The snapshot job
times out after 60secs. The time was spent flushing the memstores on all
region servers (as expected) which seems to take too long.  Also it seems
from the RS logs like this is done serially.

Offline snapshots isn't an option since we can't disable this table which
serves the event writing.

*We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
with 240 regions, 1 Column Family, 4 region servers. *

All in all, I'd like to ask if things would improve if we upgraded to Hbase
0.98.+ Are there known benchmark numbers on expected snapshot performance
for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
dynamically take a snapshot, run the job, delete/re-use the snapshot based
on freshness. At the least, we need the snapshot to be fresh until the last
hour.

Also from what I understand in Hbase, scans are not consistent at the table
level but are at the row level. Are there other ways I can query the online
table without hurting the write throughput?

Cheers,
-Gautam.


Re: Hbase Scan/Snapshot Performance...

2014-08-12 Thread Gautam
Thanks for the replies..

Matteo,

  We'r running 94.6 since February so, sadly the prod cluster doesn't have
this SKIP_FLUSH option right now. Would be great if there are options I
could use right now until we upgrade to 98.

Ted,
 Thanks for the jira. That is exactly what we intend to use for running
the MR jobs over snapshots. Just wanted to know how easy/lightweight
snapshotting can be before we set our eyes on moving the whole thing over.


Cheers,
-Gautam.



On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu yuzhih...@gmail.com wrote:

 Gautum:
 Please take a look at this:
 HBASE-8369 MapReduce over snapshot files

 Cheers


 On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi theo.berto...@gmail.com
 wrote:

  There is HBASE-10935, included in  0.94.21 where you can specify to skip
  the memstore flush and the result will be the online version of an
 offline
  snapshot
 
 
  snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH = true}
 
 
 
  On Tue, Aug 12, 2014 at 10:58 PM, Gautam gautamkows...@gmail.com
 wrote:
 
   Hello,
  
We'v been using and loving Hbase for couple of months now. Our
  primary
   usecase for Hbase is writing events in stream to an online time series
   Hbase table. Every so often we run medium to large batch scan MR jobs
 on
   sections (1hour, 1 day, 1 week)  of this same time series table. This
   online table is now showing spikes whenever these large batched read
 jobs
   are run. Write throughput goes down while these sequential scans are
   running on the table.
  
   We'v been playing around with snapshots and are considering using
  snapshots
   to take over the responsibility for running these scheduled hourly,
  daily,
   weekly jobs so that the online table doesn't get affected. From
  preliminary
   tests it looks like online snapshots take waay too long. The snapshot
 job
   times out after 60secs. The time was spent flushing the memstores on
 all
   region servers (as expected) which seems to take too long.  Also it
 seems
   from the RS logs like this is done serially.
  
   Offline snapshots isn't an option since we can't disable this table
 which
   serves the event writing.
  
   *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB
 Table
   with 240 regions, 1 Column Family, 4 region servers. *
  
   All in all, I'd like to ask if things would improve if we upgraded to
  Hbase
   0.98.+ Are there known benchmark numbers on expected snapshot
 performance
   for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
   dynamically take a snapshot, run the job, delete/re-use the snapshot
  based
   on freshness. At the least, we need the snapshot to be fresh until the
  last
   hour.
  
   Also from what I understand in Hbase, scans are not consistent at the
  table
   level but are at the row level. Are there other ways I can query the
  online
   table without hurting the write throughput?
  
   Cheers,
   -Gautam.
  
 




-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Re: how to move hbase table data between diffent version hbase?

2014-07-08 Thread Gautam Gopalakrishnan
An earlier thread[1] talks about a similar problem. If the 0.96
cluster is fresh you can copy files across and upgrade

1. 
http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3ccaflnt_ofhg1xgvwygpauymt-m3ncujr9rdqopdi-ad0pzca...@mail.gmail.com%3E


On Wed, Jul 9, 2014 at 1:52 PM, ch huang justlo...@gmail.com wrote:
 hi,maillist:
   i have two hbase envionment , one is 0.94 based on cdh4.4 ,another is
 0.96 based on chd5,i want to move 0.94 table data to 0.96 ,how can i do it?
 i see doc say hbase 0.96 do a lot of change,it do not compatible with 0.94


best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Hello all,

I have an use case where I need to write 1 million to 10 million records
periodically (with intervals of 1 minutes to 10 minutes), into an HBase
table.

Once the insert is completed, these records are queried immediately from
another program - multiple reads.

So, this is one massive write followed by many reads.

I have two approaches to insert these records into the HBase table -

Use HTable or HTableMultiplexer to stream the data to HBase table.

or

Write the data to HDFS store as a sequence file (avro in my case) - run map
reduce job using HFileOutputFormat and then load the output files into
HBase cluster.
Something like,

  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
  loader.doBulkLoad(new Path(outputDir), hTable);


In my use case which approach would be better?

If I use HTable interface, would the inserted data be in the HBase cache,
before flushing to the files, for immediate read queries?

If I use map reduce job to insert, would the data be loaded into the HBase
cache immediately? or only the output files would be copied to respective
hbase table specific directories?

So, which approach is better for write and then immediate multiple read
operations?

Thanks,
Gautam


Re: best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Hi,

Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
table has one column family.

I have setup a cluster for testing - 1 master and 3 region servers. Each
have a heap size of 3 GB, single cpu.

I have pre-split the table into 30 regions. I do not have to keep data
forever, I could purge older records periodically.

Thanks,

Gautam



On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you tell us the average size of your records and how much heap is
 given to the region servers ?

 Thanks

 On Aug 23, 2013, at 12:11 AM, Gautam Borah gautam.bo...@gmail.com wrote:

  Hello all,
 
  I have an use case where I need to write 1 million to 10 million records
  periodically (with intervals of 1 minutes to 10 minutes), into an HBase
  table.
 
  Once the insert is completed, these records are queried immediately from
  another program - multiple reads.
 
  So, this is one massive write followed by many reads.
 
  I have two approaches to insert these records into the HBase table -
 
  Use HTable or HTableMultiplexer to stream the data to HBase table.
 
  or
 
  Write the data to HDFS store as a sequence file (avro in my case) - run
 map
  reduce job using HFileOutputFormat and then load the output files into
  HBase cluster.
  Something like,
 
   LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
   loader.doBulkLoad(new Path(outputDir), hTable);
 
 
  In my use case which approach would be better?
 
  If I use HTable interface, would the inserted data be in the HBase cache,
  before flushing to the files, for immediate read queries?
 
  If I use map reduce job to insert, would the data be loaded into the
 HBase
  cache immediately? or only the output files would be copied to respective
  hbase table specific directories?
 
  So, which approach is better for write and then immediate multiple read
  operations?
 
  Thanks,
  Gautam



Re: best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Thanks Ted for your response, and clarifying the behavior for using HTable
interface.

What would be the behavior for inserting data using map reduce job? would
the recently added records be in the memstore? or I need to load them for
read queries after the insert is done?

Thanks,
Gautam


On Fri, Aug 23, 2013 at 2:43 PM, Ted Yu yuzhih...@gmail.com wrote:

 Assuming you are using 0.94, the default value
 for hbase.regionserver.global.memstore.lowerLimit is 0.35

 Meaning, memstore on each region server would be able to hold 3000M * 0.35
 / 60 = 17.5 mil records (roughly).

 bq. If I use HTable interface, would the inserted data be in the HBase
 cache, before flushing to the files, for immediate read queries?

 Yes.

 Cheers


 On Fri, Aug 23, 2013 at 12:01 PM, Gautam Borah gautam.bo...@gmail.com
 wrote:

  Hi,
 
  Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
  table has one column family.
 
  I have setup a cluster for testing - 1 master and 3 region servers. Each
  have a heap size of 3 GB, single cpu.
 
  I have pre-split the table into 30 regions. I do not have to keep data
  forever, I could purge older records periodically.
 
  Thanks,
 
  Gautam
 
 
 
  On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Can you tell us the average size of your records and how much heap is
   given to the region servers ?
  
   Thanks
  
   On Aug 23, 2013, at 12:11 AM, Gautam Borah gautam.bo...@gmail.com
  wrote:
  
Hello all,
   
I have an use case where I need to write 1 million to 10 million
  records
periodically (with intervals of 1 minutes to 10 minutes), into an
 HBase
table.
   
Once the insert is completed, these records are queried immediately
  from
another program - multiple reads.
   
So, this is one massive write followed by many reads.
   
I have two approaches to insert these records into the HBase table -
   
Use HTable or HTableMultiplexer to stream the data to HBase table.
   
or
   
Write the data to HDFS store as a sequence file (avro in my case) -
 run
   map
reduce job using HFileOutputFormat and then load the output files
 into
HBase cluster.
Something like,
   
 LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
 loader.doBulkLoad(new Path(outputDir), hTable);
   
   
In my use case which approach would be better?
   
If I use HTable interface, would the inserted data be in the HBase
  cache,
before flushing to the files, for immediate read queries?
   
If I use map reduce job to insert, would the data be loaded into the
   HBase
cache immediately? or only the output files would be copied to
  respective
hbase table specific directories?
   
So, which approach is better for write and then immediate multiple
 read
operations?
   
Thanks,
Gautam