Re: Compaction problem

2013-03-25 Thread ramkrishna vasudevan
Wha is the rate at which you are flushing? Frequent flushes will cause more
files and compaction may happen frequently but with lesser time.
If the flush size is increased to a bigger value then you will end up more
time in the compaction because the entire file has to be read and rewritten.

After you check with the flush Q see what effect you get on increasing the
memstore size.

On Tue, Mar 26, 2013 at 12:02 PM, tarang dawer wrote:

> Hi
> i tried the following parameters also
>
> export HBASE_REGIONSERVER_OPTS="-Xmx2g -Xms2g -Xmn256m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
>
> hbase.regionserver.global.memstore.upper
>
> Limit .50
>
> hbase.regionserver.global.memstore.lower
>
> Limit .50
>
> hbase.regionserver.handler.count
>
> 30
>
> but still not much effect.
>
> any suggestions , on how to improve the ingestion speed ??
>
> On Fri, Mar 22, 2013 at 9:04 PM, tarang dawer  >wrote:
>
> > 3 region servers 2 region servers having 5 regions each , 1 having 6
> > +2(meta and root)
> > 1 CF
> > set HBASE_HEAPSIZE in hbase-env.sh as 4gb .
> >
> > is the flush size okay ? or do i need to reduce/increase it ?
> >
> > i'll look into the flushQ and compactionQ size and get back to you .
> >
> > do these parameters seem okay to you ? if something seems odd / not in
> > order , please do tell
> >
> > Thanks
> > Tarang Dawer
> >
> >
> > On Fri, Mar 22, 2013 at 8:21 PM, Anoop John 
> wrote:
> >
> >> How many regions per  RS? And CF in table?
> >> What is the -Xmx for RS process? You will bget 35% of that memory for
> all
> >> the memstores in the RS.
> >> hbase.hregion.memstore.flush.size = 1GB!!
> >>
> >> Can you closely observe the flushQ size and compactionQ size?  You may
> be
> >> getting so many small file flushes(Due to global heap pressure) and
> >> subsequently many minor compactions.
> >>
> >> -Anoop-
> >>
> >> On Fri, Mar 22, 2013 at 8:14 PM, tarang dawer  >> >wrote:
> >>
> >> > Hi
> >> > As per my use case , I have to write around 100gb data , with a
> >> ingestion
> >> > speed of around 200 mbps. While writing , i am getting a performance
> >> hit by
> >> > compaction , which adds to the delay.
> >> > I am using a 8 core machine with 16 gb RAM available., 2 Tb hdd
> 7200RPM.
> >> > Got some idea from the archives and  tried pre splitting the regions ,
> >> > configured HBase with following parameters(configured the parameters
> in
> >> a
> >> > haste , so please guide me if anything's out of order) :-
> >> >
> >> >
> >> > 
> >> > hbase.hregion.memstore.block.multiplier
> >> > 4
> >> > 
> >> > 
> >> >  hbase.hregion.memstore.flush.size
> >> >  1073741824
> >> > 
> >> >
> >> > 
> >> > hbase.hregion.max.filesize
> >> > 1073741824
> >> > 
> >> > 
> >> > hbase.hstore.compactionThreshold
> >> > 5
> >> > 
> >> > 
> >> >   hbase.hregion.majorcompaction
> >> >   0
> >> > 
> >> > 
> >> > hbase.hstore.blockingWaitTime
> >> > 3
> >> > 
> >> >  
> >> >  hbase.hstore.blockingStoreFiles
> >> >  200
> >> >  
> >> >
> >> >   
> >> > hbase.regionserver.lease.period
> >> > 300
> >> >   
> >> >
> >> >
> >> > but still m not able to achieve the optimal rate , getting around 110
> >> mbps.
> >> > Need some optimizations ,so please could you help out ?
> >> >
> >> > Thanks
> >> > Tarang Dawer
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Mar 22, 2013 at 6:05 PM, Jean-Marc Spaggiari <
> >> > jean-m...@spaggiari.org> wrote:
> >> >
> >> > > Hi Tarang,
> >> > >
> >> > > I will recommand you to take a look at the list archives first to
> see
> >> > > all the discussions related to compaction. You will found many
> >> > > interesting hints and tips.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://search-hadoop.com/?q=compactions&fc_project=HBase&fc_type=mail+_hash_+user
> >> > >
> >> > > After that, you will need to provide more details regarding how you
> >> > > are using HBase and how the compaction is impacting you.
> >> > >
> >> > > JM
> >> > >
> >> > > 2013/3/22 tarang dawer :
> >> > > > Hi
> >> > > > I am using HBase 0.94.2 currently. While using it  , its write
> >> > > performance,
> >> > > > due to compaction is being affeced by compaction.
> >> > > > Please could you suggest some quick tips in relation to how to
> deal
> >> > with
> >> > > it
> >> > > > ?
> >> > > >
> >> > > > Thanks
> >> > > > Tarang Dawer
> >> > >
> >> >
> >>
> >
> >
>


Re: Compaction problem

2013-03-25 Thread tarang dawer
Hi
i tried the following parameters also

export HBASE_REGIONSERVER_OPTS="-Xmx2g -Xms2g -Xmn256m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

hbase.regionserver.global.memstore.upper

Limit .50

hbase.regionserver.global.memstore.lower

Limit .50

hbase.regionserver.handler.count

30

but still not much effect.

any suggestions , on how to improve the ingestion speed ??

On Fri, Mar 22, 2013 at 9:04 PM, tarang dawer wrote:

> 3 region servers 2 region servers having 5 regions each , 1 having 6
> +2(meta and root)
> 1 CF
> set HBASE_HEAPSIZE in hbase-env.sh as 4gb .
>
> is the flush size okay ? or do i need to reduce/increase it ?
>
> i'll look into the flushQ and compactionQ size and get back to you .
>
> do these parameters seem okay to you ? if something seems odd / not in
> order , please do tell
>
> Thanks
> Tarang Dawer
>
>
> On Fri, Mar 22, 2013 at 8:21 PM, Anoop John  wrote:
>
>> How many regions per  RS? And CF in table?
>> What is the -Xmx for RS process? You will bget 35% of that memory for all
>> the memstores in the RS.
>> hbase.hregion.memstore.flush.size = 1GB!!
>>
>> Can you closely observe the flushQ size and compactionQ size?  You may be
>> getting so many small file flushes(Due to global heap pressure) and
>> subsequently many minor compactions.
>>
>> -Anoop-
>>
>> On Fri, Mar 22, 2013 at 8:14 PM, tarang dawer > >wrote:
>>
>> > Hi
>> > As per my use case , I have to write around 100gb data , with a
>> ingestion
>> > speed of around 200 mbps. While writing , i am getting a performance
>> hit by
>> > compaction , which adds to the delay.
>> > I am using a 8 core machine with 16 gb RAM available., 2 Tb hdd 7200RPM.
>> > Got some idea from the archives and  tried pre splitting the regions ,
>> > configured HBase with following parameters(configured the parameters in
>> a
>> > haste , so please guide me if anything's out of order) :-
>> >
>> >
>> > 
>> > hbase.hregion.memstore.block.multiplier
>> > 4
>> > 
>> > 
>> >  hbase.hregion.memstore.flush.size
>> >  1073741824
>> > 
>> >
>> > 
>> > hbase.hregion.max.filesize
>> > 1073741824
>> > 
>> > 
>> > hbase.hstore.compactionThreshold
>> > 5
>> > 
>> > 
>> >   hbase.hregion.majorcompaction
>> >   0
>> > 
>> > 
>> > hbase.hstore.blockingWaitTime
>> > 3
>> > 
>> >  
>> >  hbase.hstore.blockingStoreFiles
>> >  200
>> >  
>> >
>> >   
>> > hbase.regionserver.lease.period
>> > 300
>> >   
>> >
>> >
>> > but still m not able to achieve the optimal rate , getting around 110
>> mbps.
>> > Need some optimizations ,so please could you help out ?
>> >
>> > Thanks
>> > Tarang Dawer
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Mar 22, 2013 at 6:05 PM, Jean-Marc Spaggiari <
>> > jean-m...@spaggiari.org> wrote:
>> >
>> > > Hi Tarang,
>> > >
>> > > I will recommand you to take a look at the list archives first to see
>> > > all the discussions related to compaction. You will found many
>> > > interesting hints and tips.
>> > >
>> > >
>> > >
>> >
>> http://search-hadoop.com/?q=compactions&fc_project=HBase&fc_type=mail+_hash_+user
>> > >
>> > > After that, you will need to provide more details regarding how you
>> > > are using HBase and how the compaction is impacting you.
>> > >
>> > > JM
>> > >
>> > > 2013/3/22 tarang dawer :
>> > > > Hi
>> > > > I am using HBase 0.94.2 currently. While using it  , its write
>> > > performance,
>> > > > due to compaction is being affeced by compaction.
>> > > > Please could you suggest some quick tips in relation to how to deal
>> > with
>> > > it
>> > > > ?
>> > > >
>> > > > Thanks
>> > > > Tarang Dawer
>> > >
>> >
>>
>
>


RE: Getting less write throughput due to more number of columns

2013-03-25 Thread Anoop Sam John
When the number of columns (qualifiers) are more yes it can impact the 
performance. In HBase every where the storage will be in terms of KVs. The key 
will be some thing like rowkey+cfname+columnname+TS...

So when u have 26 cells in a put then there will be repetition of many bytes in 
the key.(One KV per column) So u will end up in transferring more data. Within 
memstore more data(actual KV data size) getting written and so more frequent 
flushes.. etc..

Have a look at Intel Panthera Document Store impl.

-Anoop-

From: Ankit Jain [ankitjainc...@gmail.com]
Sent: Monday, March 25, 2013 10:19 PM
To: user@hbase.apache.org
Subject: Getting less write throughput due to more number of columns

Hi All,

I am writing a records into HBase. I ran the performance test on following
two cases:

Set1: Input record contains 26 columns and record size is 2Kb.

Set2: Input record contain 1 column and record size is 2Kb.

In second case I am getting 8MBps more performance than step.

are the large number of columns have any impact on write performance and If
yes, how we can overcome it.

--
Thanks,
Ankit Jain

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread ramkrishna vasudevan
Hi Pankaj

Is it possible for you to profile the RS when this happens?  Either may be
like the Thrift adds an overhead or it should be some where the code is
spending more time.

As you said there may be a slight decrease in performance of the put
because now more values has to go in but should not be this significant.
We can work on based on the profile output and check what are we doing.

Regards
Ram

On Tue, Mar 26, 2013 at 5:19 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> For a total of 1.5kb with 4 columns = 384 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
> -num_keys 100
> 13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
> Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1
>
> For a total of 1.5kb with 100 columns = 15 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
> -num_keys 100
> 13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
> cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
> Current: [keys/s=162, latency=616 ms], insertedUpTo=-1
>
> So overall, the speed is the same. A bit faster with 100 columns than
> with 4. I don't think there is any negative impact on HBase side
> because of all those columns. Might be interesting to test the same
> thing over Thrift...
>
> JM
>
> 2013/3/25 Pankaj Misra :
> > Yes Ted, we have been observing Thrift API to clearly outperform Java
> native Hbase API, due to binary communication protocol, at higher loads.
> >
> > Tariq, the specs of the machine on which we are performing these tests
> are as given below.
> >
> > Processor : i3770K, 8 logical cores (4 physical, with 2 logical per
> physical core), 3.5 Ghz clock speed
> > RAM: 32 GB DDR3
> > HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> > HDFS and Hbase deployed in pseudo-distributed mode.
> > We are having 4 parallel streams writing to HBase.
> >
> > We used the same setup for the previous tests as well, and to be very
> frank, we did expect a bit of drop in performance when we had to test with
> 40 columns, but did not expect to get half the performance. When we tested
> with 20 columns, we were consistently getting a performance of 200 mbps of
> writes. But with 40 columns we are getting 90 mbps of throughput only on
> the same setup.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > 
> > From: Ted Yu [yuzhih...@gmail.com]
> > Sent: Tuesday, March 26, 2013 1:09 AM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > bq. These records are being written using batch mutation with thrift API
> > This is an important information, I think.
> >
> > Batch mutation through Java API would incur lower overhead.
> >
> > On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> > wrote:
> >
> >> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> >> appreciate it.
> >>
> >> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> >> distributed across these columns.
> >>
> >> Jean, some columns are storing as small as a single byte value, while
> few
> >> of the columns are storing as much as 80-125 bytes of data. The overall
> >> record size is 1.5 KB. These records are being written using batch
> mutation
> >> with thrift API, where in we are writing 100 records per batch mutation.
> >>
> >> Thanks and Regards
> >> Pankaj Misra
> >>
> >>
> >> 
> >> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
> >> Sent: Monday, March 25, 2013 11:57 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> I just ran some LoadTest to see if I can reproduce that.
> >>
> >> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> >> -num_keys 100
> >> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> >> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> >> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
> >>
> >> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> >> -num_keys 100
> >>
> >> This one crashed because I don't have enought disk space, so I'm
> >> re-running it, but just before it crashed it was showing about 24.5
> >> slower. which is coherent since it's writing 25 more columns.
> >>
> >> What size of data do you have? Big cells? Small cells? I will retry
> >> the test above with more lines and keep you posted.
> >>
> >> 2013/3/25 Pankaj Misra :
> >> > Yes Ted, you are right, we are having table regions pre-split, and we
> >> see that both regions are almost evenly filled in both the tests.
> >> >
> >> > This does not seem to be a regression though, since we were getting
> good
> >> write rates when we had lesser number of columns.
> >> >
> >> > Thanks and Regards
> >> > Pankaj Misra
> 

Crash when run two jobs at the same time with same Hbase table

2013-03-25 Thread GuoWei
Dear,

When I run two MR Jobs which will read same Hbase table and write to another 
same Hbase table at the same time. The result is one job successful finished. 
And another job crashed. And The following shows the error log.

Please help me to find out why ? 


<2013-03-25 15:50:34,026>   -  map 
0% reduce 0%(JobClient.java:monitorAndPrintJob:1301)
<2013-03-25 15:50:36,096>   - Could not 
find output size (Task.java:calculateOutputSize:948)
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
output/file.out in any of the configured local directories
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
at 
org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
at org.apache.hadoop.mapred.Task.done(Task.java:875)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
<2013-03-25 15:50:36,100>   - 
(LocalJobRunner.java:statusUpdate:321)
<2013-03-25 15:50:36,102>   - Task 
'attempt_local_0001_m_00_0' done.(Task.java:sendDone:959)
<2013-03-25 15:50:36,111>  
 - Output path is null in 
cleanup(FileOutputCommitter.java:cleanupJob:100)
<2013-03-25 15:50:36,111>   - 
job_local_0001(LocalJobRunner.java:run:298)
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
output/file.out in any of the configured local directories
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
at 
org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:236)
<2013-03-25 15:50:37,029>   -  map 
100% reduce 0%(JobClient.java:monitorAndPrintJob:1301)
<2013-03-25 15:50:37,030>   - Job 
complete: job_local_0001(JobClient.java:monitorAndPrintJob:1356)
<2013-03-25 15:50:37,031>   - 
Counters: 15(Counters.java:log:585)
<2013-03-25 15:50:37,031>   -   File 
Input Format Counters (Counters.java:log:587)
<2013-03-25 15:50:37,031>   - 
Bytes Read=0(Counters.java:log:589)
<2013-03-25 15:50:37,032>   -   
FileSystemCounters(Counters.java:log:587)
<2013-03-25 15:50:37,032>   - 
FILE_BYTES_READ=10294950(Counters.java:log:589)
<2013-03-25 15:50:37,033>   - 
FILE_BYTES_WRITTEN=10432139(Counters.java:log:589)
<2013-03-25 15:50:37,033>   -   
Map-Reduce Framework(Counters.java:log:587)
<2013-03-25 15:50:37,033>   - 
Map output materialized bytes=4006(Counters.java:log:589)
<2013-03-25 15:50:37,034>   - 
Combine output records=0(Counters.java:log:589)
<2013-03-25 15:50:37,034>   - 
Map input records=500(Counters.java:log:589)
<2013-03-25 15:50:37,035>   - 
Physical memory (bytes) snapshot=0(Counters.java:log:589)
<2013-03-25 15:50:37,035>   - 
Spilled Records=500(Counters.java:log:589)
<2013-03-25 15:50:37,035>   - 
Map output bytes=3000(Counters.java:log:589)
<2013-03-25 15:50:37,036>   - 
Total committed heap usage (bytes)=202702848(Counters.java:log:589)
<2013-03-25 15:50:37,036>   - 
CPU time spent (ms)=0(Counters.java:log:589)
<2013-03-25 15:50:37,037>   - 
Virtual memory (bytes) snapshot=0(Counters.java:log:589)
<2013-03-25 15:50:37,037>   - 
SPLIT_RAW_BYTES=105(Counters.java:log:589)
<2013-03-25 15:50:37,038>   - 
Map output records=500(Counters.java:log:589)
<2013-03-25 15:50:37,038>   - 
Combine input 

Thanks a lot. 



Best Regards

Weibo: http://weibo.com/guowee
Web: http://www.wbkit.com
- 
WesternBridge Tech: Professional software service provider. Professional is 
MANNER as well CAPABILITY.



答复: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread 谢良
Maybe we should adopt some ideas from RDBMS ?
In MySQL area:
Innodb storage engine has a buffer pool(just like current block cache), caches 
both
compressed and uncompressed pages in latest innodb version, it brings
about adaptive LRU algorithm, see 
http://dev.mysql.com/doc/innodb/1.1/en/innodb-compression-internals.html,
in short, it's somehow more subtle for this detail than leveldb&hbase's 
implementation, 
per my view. In deed, we(Xiaomi) had a plan to develop&evaluate it already
(we logged it in our internal phabricator system before), hopefully we could 
contribute it 
to community in the future.

Another storage engine Falcon has "Row Cache" feature, which similar with Enis 
mentioned, 
It's more friendly against random read scenario.
Every user table could choose a prefered storage engine in MySQL, so here, my 
point is:
maybe we need to consider supporting more configureable cache strategy per 
table granularity

Regards,
Liang

发件人: Enis Söztutar [enis@gmail.com]
发送时间: 2013年3月26日 4:26
收件人: hbase-user
Cc: lars hofhansl
主题: Re: Does HBase RegionServer benefit from OS Page Cache

> With very large heaps and a GC that can handle them (perhaps the G1 GC),
another option which might be worth experimenting with is a value (KV)
cache independent of the block cache which could be enabled on a per-table
basis
Thanks Andy for bringing this up. We've had some discussions some time ago
about a row-cache (or KV cache)
http://search-hadoop.com/m/XTlxT1xRtYw/hbase+key+value+cache+from%253Aenis&subj=RE+keyvalue+cache

The takeaway was that if you are mostly doing point gets, rather than
scans, this cache might be better.

> 1) [HBASE-7404]: L1/L2 block cache
I knew about the Bucket cache, but not that bucket cache could hold
compressed blocks. Is it the case, or are you suggesting we can add that to
this L2 cache.

>  2) [HBASE-5263] Preserving cached data on compactions through
cache-on-write
Thanks, this is the same idea. I'll track the ticket.

Enis


On Mon, Mar 25, 2013 at 12:18 PM, Liyin Tang  wrote:

> Hi Enis,
> Good ideas ! And hbase community is driving on these 2 items.
> 1) [HBASE-7404]: L1/L2 block cache
> 2) [HBASE-5263] Preserving cached data on compactions through
> cache-on-write
>
> Thanks a lot
> Liyin
> 
> From: Enis Söztutar [enis@gmail.com]
> Sent: Monday, March 25, 2013 11:24 AM
> To: hbase-user
> Cc: lars hofhansl
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> Thanks Liyin for sharing your use cases.
>
> Related to those, I was thinking of two improvements:
>  - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
> in its block cache, failing over the compressed one if decompressed one
> gets evicted. With very large heaps, maybe keeping around the compressed
> blocks in a secondary cache makes sense?
>  - A compaction will trash the cache. But maybe we can track keyvalues
> (inside cached blocks are cached) for the files in the compaction, and mark
> the blocks of the resulting compacted file which contain previously cached
> keyvalues to be cached after the compaction. I have to research the
> feasibility of this approach.
>
> Enis
>
>
> On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang  wrote:
>
> > Block cache is for uncompressed data while OS page contains the
> compressed
> > data. Unless the request pattern is full-table sequential scan, the block
> > cache is still quite useful. I think the size of the block cache should
> be
> > the amont of hot data we want to retain within a compaction cycle, which
> is
> > quite hard to estimate in some use cases.
> >
> >
> > Thanks a lot
> > Liyin
> > 
> > From: lars hofhansl [la...@apache.org]
> > Sent: Saturday, March 23, 2013 10:20 PM
> > To: user@hbase.apache.org
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > Interesting.
> >
> > > 2) The blocks in the block cache will be naturally invalid quickly
> after
> > the compactions.
> >
> > Should one keep the block cache small in order to increase the OS page
> > cache?
> >
> > Does you data suggest we should not use the block cache at all?
> >
> >
> > Thanks.
> >
> > -- Lars
> >
> >
> >
> > 
> >  From: Liyin Tang 
> > To: user@hbase.apache.org
> > Sent: Saturday, March 23, 2013 9:44 PM
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > We (Facebook) are closely monitoring the OS page cache hit ratio in the
> > production environments. My experience is if your data access pattern is
> > very random, then the OS page cache won't help you so much even though
> the
> > data locality is very high. On the other hand, if the requests are always
> > against the recent data points, then the page cache hit ratio could be
> much
> > higher.
> >
> > Actually, there are lots of optimizations could be done in HDFS. For
> > example, we are working on fadvice a

Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Ted Yu
Looks like this is what you were looking for:
HBASE-4063 Improve TableInputFormat to allow application to configure the
number of mappers

Cheers

On Mon, Mar 25, 2013 at 7:33 PM, Lu, Wei  wrote:

> Hi, Michael,
>
> Yes, I read some stuff in blogs and I did pre-split + large max region
> file size to avoid on line split. Also set region size large to reduce
> region server heap size, so I don't what to manually split.
>
> Let me make it clear. The problem I faced is to spawn more than one map
> task for each large region when running MR on top of hbase. Which means to
> run several map tasks each scans a row key range on each region.
>
> Thanks,
> Wei
>
>
>
> -Original Message-
> From: Michael Segel [mailto:michael_se...@hotmail.com]
> Sent: Monday, March 25, 2013 11:52 PM
> To: user@hbase.apache.org
> Subject: Re: ‘split’ start/stop key range of large table regions for more
> map tasks
>
> I think the problem is that Wei has been reading some stuff in blogs and
> that's why he has such a large region size to start with.
>
> So if he manually splits the logs, drops the region size to something more
> appropriate...
>
> Or if he unloads the table, drops the table, recreates the table with a
> smaller more reasonable region size... reloads...  he'd be better off.
>
>
> On Mar 25, 2013, at 6:20 AM, Jean-Marc Spaggiari 
> wrote:
>
> > Hi Wei,
> >
> > Have you looked at MAX_FILESIZE? If your table is 1TB size, and you
> > have 10 RS and want 12 regions per server, you can setup this to
> > 1TB/(10x12) and you will get at least all those regions (and even a
> > bit more).
> >
> > JM
> >
> > 2013/3/25 Lu, Wei :
> >> We are facing big region size but small  region number of a table. 10
> region servers, each has only one region with size over 10G, map slot of
> each task tracker is 12. We are planning to ‘split’ start/stop key range of
> large table regions for more map tasks, so that we can better make usage of
> mapreduce resource (currently only one of 12 map slot is used). I have some
> ideas below to split, please give me comments or advice.
> >> We are considering of implementing a TableInputFormat that optimized
> the method:
> >> @Override
> >> public List getSplits(JobContext context) throws IOException
> >> Following is a idea:
> >>
> >> 1)  Split start/stop key range based on threshold or avg. of region
> size
> >> Set a threshold t1; collect each region’s size, if region size is
> larger than region size, then ‘split’ the range [startkey, stopkey) of the
> region, to N = {region size} / t1 sub-ranges: [startkey, stopkey1),
> [stopkey1, stopkey2),….,[stopkeyN-1, stopkey);
> >> As for  t1, we could set as we like, or leave it as the average size of
> all region size. We will set it to be a small value when each region size
> is very large, so that ‘split’ will happen;
> >>
> >> 2)  Get split key by sampling hfile block keys
> >> As for  the stopkey1, …stopkeyN-1, hbase doesn’t supply apis to get
> them and only Pair getStartEndKeys()is given to get
> start/stop key of the region. 1) We could do calculate to roughly get them,
> or 2) we can directly get each store file’s block key through Hfile.Reader
> and merge sort them. Then we can do sampling.
> >> Does this method make sense?
> >>
> >> Thanks,
> >> Wei
> >>
> >
>
>


RE: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Lu, Wei
Hi, Michael,

Yes, I read some stuff in blogs and I did pre-split + large max region file 
size to avoid on line split. Also set region size large to reduce region server 
heap size, so I don't what to manually split.

Let me make it clear. The problem I faced is to spawn more than one map task 
for each large region when running MR on top of hbase. Which means to run 
several map tasks each scans a row key range on each region.

Thanks,
Wei



-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com] 
Sent: Monday, March 25, 2013 11:52 PM
To: user@hbase.apache.org
Subject: Re: ‘split’ start/stop key range of large table regions for more map 
tasks

I think the problem is that Wei has been reading some stuff in blogs and that's 
why he has such a large region size to start with. 

So if he manually splits the logs, drops the region size to something more 
appropriate... 

Or if he unloads the table, drops the table, recreates the table with a smaller 
more reasonable region size... reloads...  he'd be better off. 


On Mar 25, 2013, at 6:20 AM, Jean-Marc Spaggiari  
wrote:

> Hi Wei,
> 
> Have you looked at MAX_FILESIZE? If your table is 1TB size, and you
> have 10 RS and want 12 regions per server, you can setup this to
> 1TB/(10x12) and you will get at least all those regions (and even a
> bit more).
> 
> JM
> 
> 2013/3/25 Lu, Wei :
>> We are facing big region size but small  region number of a table. 10 region 
>> servers, each has only one region with size over 10G, map slot of each task 
>> tracker is 12. We are planning to ‘split’ start/stop key range of large 
>> table regions for more map tasks, so that we can better make usage of 
>> mapreduce resource (currently only one of 12 map slot is used). I have some 
>> ideas below to split, please give me comments or advice.
>> We are considering of implementing a TableInputFormat that optimized the 
>> method:
>> @Override
>> public List getSplits(JobContext context) throws IOException
>> Following is a idea:
>> 
>> 1)  Split start/stop key range based on threshold or avg. of region size
>> Set a threshold t1; collect each region’s size, if region size is larger 
>> than region size, then ‘split’ the range [startkey, stopkey) of the region, 
>> to N = {region size} / t1 sub-ranges: [startkey, stopkey1), [stopkey1, 
>> stopkey2),….,[stopkeyN-1, stopkey);
>> As for  t1, we could set as we like, or leave it as the average size of all 
>> region size. We will set it to be a small value when each region size is 
>> very large, so that ‘split’ will happen;
>> 
>> 2)  Get split key by sampling hfile block keys
>> As for  the stopkey1, …stopkeyN-1, hbase doesn’t supply apis to get them and 
>> only Pair getStartEndKeys()is given to get start/stop key 
>> of the region. 1) We could do calculate to roughly get them, or 2) we can 
>> directly get each store file’s block key through Hfile.Reader and merge sort 
>> them. Then we can do sampling.
>> Does this method make sense?
>> 
>> Thanks,
>> Wei
>> 
> 



Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
HBASE-8195 has been opened.

But I don't think this is related to the issue Jian is facing ;)

2013/3/25 Ted Yu :
> I agree.
>
> Log a JIRA ?
>
> On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> I think this should be removed from there since it has been removed
>> from the code.
>>
>> 2013/3/25 Ted Yu :
>> > I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
>> >
>> > On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
>> > jean-m...@spaggiari.org> wrote:
>> >
>> >> It's not even in 0.95 neither in trunk...
>> >>
>> >> It was supposed to be added by HBASE-5547...
>> >>
>> >> But now, it's hardcoded.
>> >>
>> >> In HConstant, you have public static final String
>> >> HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
>> >> HFileArchiveUtil. 5547 was supposed to make that configurable by
>> >> reading this property in the config with a default value as
>> >> ".archive".
>> >>
>> >> If  you look in 0.95 history, you can see that it has been succesfuly
>> >> applied.
>> >>
>> >> Then this has been removed by HBASE 6439.
>> >>
>> >> Ted, where have you found this config extract? Is it from the
>> >> documentation? Because it's not used any more. I think this should be
>> >> cleaned.
>> >>
>> >> Jian, have you recently updated from an older HBase version?
>> >>
>> >> JM
>> >>
>> >> 2013/3/25 Ted Yu :
>> >> > Do you use table archiving / snapshotting ?
>> >> > If not, the following DEBUG output should not be a concern.
>> >> >
>> >> > On a related note, I was searching for where the following config is
>> used
>> >> > in 0.94 source code but was unable to find any:
>> >> >
>> >> >   
>> >> > hbase.table.archive.directory
>> >> > .archive
>> >> > Per-table directory name under which to backup files
>> >> for a
>> >> >   table. Files are moved to the same directories as they would be
>> >> under
>> >> > the
>> >> >   table directory, but instead are just one level lower (under
>> >> >   table/.archive/... rather than table/...). Currently only
>> applies
>> >> to
>> >> > HFiles.
>> >> >   
>> >> >
>> >> > 94-hbase tyu$ find . -name '*.java' -exec grep
>> >> > "hbase.table.archive.directory" {} \; -print
>> >> >
>> >> > On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps
>> >> printing
>> >> >> the following message:
>> >> >>
>> >> >> 2013-03-25 19:32:15,469 DEBUG
>> >> >> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog
>> row(s)
>> >> and
>> >> >> gc'd 0 unreferenced parent region(s)
>> >> >> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils:
>> >> hdfs://
>> >> >> 10.2.42.55:9000/hbase/.archive doesn't exist
>> >> >> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils:
>> >> hdfs://
>> >> >> 10.2.42.55:9000/hbase/.archive doesn't exist
>> >> >>
>> >> >> Is there anything wrong here?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Jian
>> >> >>
>> >>
>>


Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
I agree.

Log a JIRA ?

On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> I think this should be removed from there since it has been removed
> from the code.
>
> 2013/3/25 Ted Yu :
> > I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
> >
> > On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> >> It's not even in 0.95 neither in trunk...
> >>
> >> It was supposed to be added by HBASE-5547...
> >>
> >> But now, it's hardcoded.
> >>
> >> In HConstant, you have public static final String
> >> HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
> >> HFileArchiveUtil. 5547 was supposed to make that configurable by
> >> reading this property in the config with a default value as
> >> ".archive".
> >>
> >> If  you look in 0.95 history, you can see that it has been succesfuly
> >> applied.
> >>
> >> Then this has been removed by HBASE 6439.
> >>
> >> Ted, where have you found this config extract? Is it from the
> >> documentation? Because it's not used any more. I think this should be
> >> cleaned.
> >>
> >> Jian, have you recently updated from an older HBase version?
> >>
> >> JM
> >>
> >> 2013/3/25 Ted Yu :
> >> > Do you use table archiving / snapshotting ?
> >> > If not, the following DEBUG output should not be a concern.
> >> >
> >> > On a related note, I was searching for where the following config is
> used
> >> > in 0.94 source code but was unable to find any:
> >> >
> >> >   
> >> > hbase.table.archive.directory
> >> > .archive
> >> > Per-table directory name under which to backup files
> >> for a
> >> >   table. Files are moved to the same directories as they would be
> >> under
> >> > the
> >> >   table directory, but instead are just one level lower (under
> >> >   table/.archive/... rather than table/...). Currently only
> applies
> >> to
> >> > HFiles.
> >> >   
> >> >
> >> > 94-hbase tyu$ find . -name '*.java' -exec grep
> >> > "hbase.table.archive.directory" {} \; -print
> >> >
> >> > On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps
> >> printing
> >> >> the following message:
> >> >>
> >> >> 2013-03-25 19:32:15,469 DEBUG
> >> >> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog
> row(s)
> >> and
> >> >> gc'd 0 unreferenced parent region(s)
> >> >> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> >> hdfs://
> >> >> 10.2.42.55:9000/hbase/.archive doesn't exist
> >> >> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> >> hdfs://
> >> >> 10.2.42.55:9000/hbase/.archive doesn't exist
> >> >>
> >> >> Is there anything wrong here?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Jian
> >> >>
> >>
>


Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
I think this should be removed from there since it has been removed
from the code.

2013/3/25 Ted Yu :
> I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
>
> On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> It's not even in 0.95 neither in trunk...
>>
>> It was supposed to be added by HBASE-5547...
>>
>> But now, it's hardcoded.
>>
>> In HConstant, you have public static final String
>> HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
>> HFileArchiveUtil. 5547 was supposed to make that configurable by
>> reading this property in the config with a default value as
>> ".archive".
>>
>> If  you look in 0.95 history, you can see that it has been succesfuly
>> applied.
>>
>> Then this has been removed by HBASE 6439.
>>
>> Ted, where have you found this config extract? Is it from the
>> documentation? Because it's not used any more. I think this should be
>> cleaned.
>>
>> Jian, have you recently updated from an older HBase version?
>>
>> JM
>>
>> 2013/3/25 Ted Yu :
>> > Do you use table archiving / snapshotting ?
>> > If not, the following DEBUG output should not be a concern.
>> >
>> > On a related note, I was searching for where the following config is used
>> > in 0.94 source code but was unable to find any:
>> >
>> >   
>> > hbase.table.archive.directory
>> > .archive
>> > Per-table directory name under which to backup files
>> for a
>> >   table. Files are moved to the same directories as they would be
>> under
>> > the
>> >   table directory, but instead are just one level lower (under
>> >   table/.archive/... rather than table/...). Currently only applies
>> to
>> > HFiles.
>> >   
>> >
>> > 94-hbase tyu$ find . -name '*.java' -exec grep
>> > "hbase.table.archive.directory" {} \; -print
>> >
>> > On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps
>> printing
>> >> the following message:
>> >>
>> >> 2013-03-25 19:32:15,469 DEBUG
>> >> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s)
>> and
>> >> gc'd 0 unreferenced parent region(s)
>> >> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils:
>> hdfs://
>> >> 10.2.42.55:9000/hbase/.archive doesn't exist
>> >> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils:
>> hdfs://
>> >> 10.2.42.55:9000/hbase/.archive doesn't exist
>> >>
>> >> Is there anything wrong here?
>> >>
>> >> Thanks,
>> >>
>> >> Jian
>> >>
>>


Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.

On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> It's not even in 0.95 neither in trunk...
>
> It was supposed to be added by HBASE-5547...
>
> But now, it's hardcoded.
>
> In HConstant, you have public static final String
> HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
> HFileArchiveUtil. 5547 was supposed to make that configurable by
> reading this property in the config with a default value as
> ".archive".
>
> If  you look in 0.95 history, you can see that it has been succesfuly
> applied.
>
> Then this has been removed by HBASE 6439.
>
> Ted, where have you found this config extract? Is it from the
> documentation? Because it's not used any more. I think this should be
> cleaned.
>
> Jian, have you recently updated from an older HBase version?
>
> JM
>
> 2013/3/25 Ted Yu :
> > Do you use table archiving / snapshotting ?
> > If not, the following DEBUG output should not be a concern.
> >
> > On a related note, I was searching for where the following config is used
> > in 0.94 source code but was unable to find any:
> >
> >   
> > hbase.table.archive.directory
> > .archive
> > Per-table directory name under which to backup files
> for a
> >   table. Files are moved to the same directories as they would be
> under
> > the
> >   table directory, but instead are just one level lower (under
> >   table/.archive/... rather than table/...). Currently only applies
> to
> > HFiles.
> >   
> >
> > 94-hbase tyu$ find . -name '*.java' -exec grep
> > "hbase.table.archive.directory" {} \; -print
> >
> > On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
> > wrote:
> >
> >> Hi,
> >>
> >> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps
> printing
> >> the following message:
> >>
> >> 2013-03-25 19:32:15,469 DEBUG
> >> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s)
> and
> >> gc'd 0 unreferenced parent region(s)
> >> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> hdfs://
> >> 10.2.42.55:9000/hbase/.archive doesn't exist
> >> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> hdfs://
> >> 10.2.42.55:9000/hbase/.archive doesn't exist
> >>
> >> Is there anything wrong here?
> >>
> >> Thanks,
> >>
> >> Jian
> >>
>


Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
It's not even in 0.95 neither in trunk...

It was supposed to be added by HBASE-5547...

But now, it's hardcoded.

In HConstant, you have public static final String
HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
HFileArchiveUtil. 5547 was supposed to make that configurable by
reading this property in the config with a default value as
".archive".

If  you look in 0.95 history, you can see that it has been succesfuly applied.

Then this has been removed by HBASE 6439.

Ted, where have you found this config extract? Is it from the
documentation? Because it's not used any more. I think this should be
cleaned.

Jian, have you recently updated from an older HBase version?

JM

2013/3/25 Ted Yu :
> Do you use table archiving / snapshotting ?
> If not, the following DEBUG output should not be a concern.
>
> On a related note, I was searching for where the following config is used
> in 0.94 source code but was unable to find any:
>
>   
> hbase.table.archive.directory
> .archive
> Per-table directory name under which to backup files for a
>   table. Files are moved to the same directories as they would be under
> the
>   table directory, but instead are just one level lower (under
>   table/.archive/... rather than table/...). Currently only applies to
> HFiles.
>   
>
> 94-hbase tyu$ find . -name '*.java' -exec grep
> "hbase.table.archive.directory" {} \; -print
>
> On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
> wrote:
>
>> Hi,
>>
>> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps printing
>> the following message:
>>
>> 2013-03-25 19:32:15,469 DEBUG
>> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and
>> gc'd 0 unreferenced parent region(s)
>> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
>> 10.2.42.55:9000/hbase/.archive doesn't exist
>> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
>> 10.2.42.55:9000/hbase/.archive doesn't exist
>>
>> Is there anything wrong here?
>>
>> Thanks,
>>
>> Jian
>>


Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Jean-Marc Spaggiari
For a total of 1.5kb with 4 columns = 384 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
-num_keys 100
13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1

For a total of 1.5kb with 100 columns = 15 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
-num_keys 100
13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
Current: [keys/s=162, latency=616 ms], insertedUpTo=-1

So overall, the speed is the same. A bit faster with 100 columns than
with 4. I don't think there is any negative impact on HBase side
because of all those columns. Might be interesting to test the same
thing over Thrift...

JM

2013/3/25 Pankaj Misra :
> Yes Ted, we have been observing Thrift API to clearly outperform Java native 
> Hbase API, due to binary communication protocol, at higher loads.
>
> Tariq, the specs of the machine on which we are performing these tests are as 
> given below.
>
> Processor : i3770K, 8 logical cores (4 physical, with 2 logical per physical 
> core), 3.5 Ghz clock speed
> RAM: 32 GB DDR3
> HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> HDFS and Hbase deployed in pseudo-distributed mode.
> We are having 4 parallel streams writing to HBase.
>
> We used the same setup for the previous tests as well, and to be very frank, 
> we did expect a bit of drop in performance when we had to test with 40 
> columns, but did not expect to get half the performance. When we tested with 
> 20 columns, we were consistently getting a performance of 200 mbps of writes. 
> But with 40 columns we are getting 90 mbps of throughput only on the same 
> setup.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Tuesday, March 26, 2013 1:09 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> bq. These records are being written using batch mutation with thrift API
> This is an important information, I think.
>
> Batch mutation through Java API would incur lower overhead.
>
> On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> wrote:
>
>> Firstly, Thanks a lot Jean and Ted for your extended help, very much
>> appreciate it.
>>
>> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
>> distributed across these columns.
>>
>> Jean, some columns are storing as small as a single byte value, while few
>> of the columns are storing as much as 80-125 bytes of data. The overall
>> record size is 1.5 KB. These records are being written using batch mutation
>> with thrift API, where in we are writing 100 records per batch mutation.
>>
>> Thanks and Regards
>> Pankaj Misra
>>
>>
>> 
>> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
>> Sent: Monday, March 25, 2013 11:57 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Writes With Large Number of Columns
>>
>> I just ran some LoadTest to see if I can reproduce that.
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
>> -num_keys 100
>> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
>> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
>> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
>> -num_keys 100
>>
>> This one crashed because I don't have enought disk space, so I'm
>> re-running it, but just before it crashed it was showing about 24.5
>> slower. which is coherent since it's writing 25 more columns.
>>
>> What size of data do you have? Big cells? Small cells? I will retry
>> the test above with more lines and keep you posted.
>>
>> 2013/3/25 Pankaj Misra :
>> > Yes Ted, you are right, we are having table regions pre-split, and we
>> see that both regions are almost evenly filled in both the tests.
>> >
>> > This does not seem to be a regression though, since we were getting good
>> write rates when we had lesser number of columns.
>> >
>> > Thanks and Regards
>> > Pankaj Misra
>> >
>> >
>> > 
>> > From: Ted Yu [yuzhih...@gmail.com]
>> > Sent: Monday, March 25, 2013 11:15 PM
>> > To: user@hbase.apache.org
>> > Cc: ankitjainc...@gmail.com
>> > Subject: Re: HBase Writes With Large Number of Columns
>> >
>> > Copying Ankit who raised the same question soon after Pankaj's initial
>> > question.
>> >
>> > On one hand I wonder if this was a regression in 0.94.5 (though
>> unlikely).
>> >
>> > Did the region servers receive (relatively) same write load for the
>> second
>> > test case ? I assume you have pre-split your tables in both cases.
>> >
>> > Cheers
>> >
>> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
>>

Re: java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Enis Söztutar
Hi,

>From the logs, it seems you are running into the same problem I have
reported last week: https://issues.apache.org/jira/browse/HBASE-8143

There are some mitigation strategies outlined in that jira. It would be
good if you can confirm:
 - How many regions in the region server
 - How many open files per region or region server (look at the num store
files)
 - Your -Xmx and -XX:MaxDirectMemorySize settings

Enis


On Mon, Mar 25, 2013 at 7:57 AM, Ted Yu  wrote:

> What version of HBase are you using ?
>
> Did you enable short circuit read in hadoop ?
>
> Thanks
>
> On Mon, Mar 25, 2013 at 4:52 AM, Dhanasekaran Anbalagan
> wrote:
>
> > Hi Guys,
> >
> > I have problem with Hbase server it's says java.lang.OutOfMemoryError:
> > Direct buffer memory
> > I new to Hbase How to solve this issue.
> >
> > This is my stake trace
> > http://paste.ubuntu.com/5646088/
> >
> >
> > -Dhanasekaran.
> > Did I learn something today? If not, I wasted it.
> >
>


Re: Incorrect Root region server

2013-03-25 Thread Ted Yu
What version of HBase are you using ?

Did table truncation report any problem ?
Are the truncated tables usable ?

Cheers

On Mon, Mar 25, 2013 at 3:56 PM, Mohit Anchlia wrote:

> I am seeing a wierd issue where zk is going to "primarymaster" (hostname)
> as a ROOT region. This host doesn't exist. Everything was working ok until
> I ran truncate on few tables. Does anyone know what might be the issue?
>


Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
Do you use table archiving / snapshotting ?
If not, the following DEBUG output should not be a concern.

On a related note, I was searching for where the following config is used
in 0.94 source code but was unable to find any:

  
hbase.table.archive.directory
.archive
Per-table directory name under which to backup files for a
  table. Files are moved to the same directories as they would be under
the
  table directory, but instead are just one level lower (under
  table/.archive/... rather than table/...). Currently only applies to
HFiles.
  

94-hbase tyu$ find . -name '*.java' -exec grep
"hbase.table.archive.directory" {} \; -print

On Mon, Mar 25, 2013 at 12:44 PM, Jian Fang
wrote:

> Hi,
>
> I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps printing
> the following message:
>
> 2013-03-25 19:32:15,469 DEBUG
> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and
> gc'd 0 unreferenced parent region(s)
> 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
> 10.2.42.55:9000/hbase/.archive doesn't exist
> 2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
> 10.2.42.55:9000/hbase/.archive doesn't exist
>
> Is there anything wrong here?
>
> Thanks,
>
> Jian
>


Incorrect Root region server

2013-03-25 Thread Mohit Anchlia
I am seeing a wierd issue where zk is going to "primarymaster" (hostname)
as a ROOT region. This host doesn't exist. Everything was working ok until
I ran truncate on few tables. Does anyone know what might be the issue?


RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Yes Ted, we have been observing Thrift API to clearly outperform Java native 
Hbase API, due to binary communication protocol, at higher loads.

Tariq, the specs of the machine on which we are performing these tests are as 
given below.

Processor : i3770K, 8 logical cores (4 physical, with 2 logical per physical 
core), 3.5 Ghz clock speed
RAM: 32 GB DDR3
HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
HDFS and Hbase deployed in pseudo-distributed mode.
We are having 4 parallel streams writing to HBase.

We used the same setup for the previous tests as well, and to be very frank, we 
did expect a bit of drop in performance when we had to test with 40 columns, 
but did not expect to get half the performance. When we tested with 20 columns, 
we were consistently getting a performance of 200 mbps of writes. But with 40 
columns we are getting 90 mbps of throughput only on the same setup.

Thanks and Regards
Pankaj Misra



From: Ted Yu [yuzhih...@gmail.com]
Sent: Tuesday, March 26, 2013 1:09 AM
To: user@hbase.apache.org
Subject: Re: HBase Writes With Large Number of Columns

bq. These records are being written using batch mutation with thrift API
This is an important information, I think.

Batch mutation through Java API would incur lower overhead.

On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
wrote:

> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> appreciate it.
>
> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> distributed across these columns.
>
> Jean, some columns are storing as small as a single byte value, while few
> of the columns are storing as much as 80-125 bytes of data. The overall
> record size is 1.5 KB. These records are being written using batch mutation
> with thrift API, where in we are writing 100 records per batch mutation.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
> Sent: Monday, March 25, 2013 11:57 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> I just ran some LoadTest to see if I can reproduce that.
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> -num_keys 100
> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> -num_keys 100
>
> This one crashed because I don't have enought disk space, so I'm
> re-running it, but just before it crashed it was showing about 24.5
> slower. which is coherent since it's writing 25 more columns.
>
> What size of data do you have? Big cells? Small cells? I will retry
> the test above with more lines and keep you posted.
>
> 2013/3/25 Pankaj Misra :
> > Yes Ted, you are right, we are having table regions pre-split, and we
> see that both regions are almost evenly filled in both the tests.
> >
> > This does not seem to be a regression though, since we were getting good
> write rates when we had lesser number of columns.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > 
> > From: Ted Yu [yuzhih...@gmail.com]
> > Sent: Monday, March 25, 2013 11:15 PM
> > To: user@hbase.apache.org
> > Cc: ankitjainc...@gmail.com
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > Copying Ankit who raised the same question soon after Pankaj's initial
> > question.
> >
> > On one hand I wonder if this was a regression in 0.94.5 (though
> unlikely).
> >
> > Did the region servers receive (relatively) same write load for the
> second
> > test case ? I assume you have pre-split your tables in both cases.
> >
> > Cheers
> >
> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> > wrote:
> >
> >> Hi Ted,
> >>
> >> Sorry for missing that detail, we are using HBase version 0.94.5
> >>
> >> Regards
> >> Pankaj Misra
> >>
> >>
> >> 
> >> From: Ted Yu [yuzhih...@gmail.com]
> >> Sent: Monday, March 25, 2013 10:29 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> If you give us the version of HBase you're using, that would give us
> some
> >> more information to help you.
> >>
> >> Cheers
> >>
> >> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
> pankaj.mi...@impetus.co.in
> >> >wrote:
> >>
> >> > Hi,
> >> >
> >> > The issue that I am facing is around the performance drop of Hbase,
> when
> >> I
> >> > was having 20 columns in a column family Vs now when I am having 40
> >> columns
> >> > in a column family. The number of columns have doubled and the
> >> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> >> data
> >> > per row across 40 columns.
> >> >
> >> > Are there any settings that I should look into for twea

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Enis Söztutar
> With very large heaps and a GC that can handle them (perhaps the G1 GC),
another option which might be worth experimenting with is a value (KV)
cache independent of the block cache which could be enabled on a per-table
basis
Thanks Andy for bringing this up. We've had some discussions some time ago
about a row-cache (or KV cache)
http://search-hadoop.com/m/XTlxT1xRtYw/hbase+key+value+cache+from%253Aenis&subj=RE+keyvalue+cache

The takeaway was that if you are mostly doing point gets, rather than
scans, this cache might be better.

> 1) [HBASE-7404]: L1/L2 block cache
I knew about the Bucket cache, but not that bucket cache could hold
compressed blocks. Is it the case, or are you suggesting we can add that to
this L2 cache.

>  2) [HBASE-5263] Preserving cached data on compactions through
cache-on-write
Thanks, this is the same idea. I'll track the ticket.

Enis


On Mon, Mar 25, 2013 at 12:18 PM, Liyin Tang  wrote:

> Hi Enis,
> Good ideas ! And hbase community is driving on these 2 items.
> 1) [HBASE-7404]: L1/L2 block cache
> 2) [HBASE-5263] Preserving cached data on compactions through
> cache-on-write
>
> Thanks a lot
> Liyin
> 
> From: Enis Söztutar [enis@gmail.com]
> Sent: Monday, March 25, 2013 11:24 AM
> To: hbase-user
> Cc: lars hofhansl
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> Thanks Liyin for sharing your use cases.
>
> Related to those, I was thinking of two improvements:
>  - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
> in its block cache, failing over the compressed one if decompressed one
> gets evicted. With very large heaps, maybe keeping around the compressed
> blocks in a secondary cache makes sense?
>  - A compaction will trash the cache. But maybe we can track keyvalues
> (inside cached blocks are cached) for the files in the compaction, and mark
> the blocks of the resulting compacted file which contain previously cached
> keyvalues to be cached after the compaction. I have to research the
> feasibility of this approach.
>
> Enis
>
>
> On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang  wrote:
>
> > Block cache is for uncompressed data while OS page contains the
> compressed
> > data. Unless the request pattern is full-table sequential scan, the block
> > cache is still quite useful. I think the size of the block cache should
> be
> > the amont of hot data we want to retain within a compaction cycle, which
> is
> > quite hard to estimate in some use cases.
> >
> >
> > Thanks a lot
> > Liyin
> > 
> > From: lars hofhansl [la...@apache.org]
> > Sent: Saturday, March 23, 2013 10:20 PM
> > To: user@hbase.apache.org
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > Interesting.
> >
> > > 2) The blocks in the block cache will be naturally invalid quickly
> after
> > the compactions.
> >
> > Should one keep the block cache small in order to increase the OS page
> > cache?
> >
> > Does you data suggest we should not use the block cache at all?
> >
> >
> > Thanks.
> >
> > -- Lars
> >
> >
> >
> > 
> >  From: Liyin Tang 
> > To: user@hbase.apache.org
> > Sent: Saturday, March 23, 2013 9:44 PM
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > We (Facebook) are closely monitoring the OS page cache hit ratio in the
> > production environments. My experience is if your data access pattern is
> > very random, then the OS page cache won't help you so much even though
> the
> > data locality is very high. On the other hand, if the requests are always
> > against the recent data points, then the page cache hit ratio could be
> much
> > higher.
> >
> > Actually, there are lots of optimizations could be done in HDFS. For
> > example, we are working on fadvice away the 2nd/3rd replicated data from
> OS
> > page cache so that it potentially could improve your OS page cache by 3X.
> > Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
> > the region server could keep more hot data in OS page cache based on the
> > read access pattern.
> >
> > Another separate point is that we probably should NOT reply on the
> > memstore/block cache to keep hot data. 1) The more data in the memstore,
> > the more data the region server need to recovery from the server
> failures.
> > So the tradeoff is the recovery time. 2) The blocks in the block cache
> will
> > be naturally invalid quickly after the compactions. So region server
> > probably won't be benefit from large JVM size at all.
> >
> > Thanks a lot
> > Liyin
> >
> > On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu  wrote:
> >
> > > Coming up is the following enhancement which would make MSLAB even
> > better:
> > >
> > > HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using
> MSLAB
> > >
> > > FYI
> > >
> > > On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta  > > >wrote:
> > >
> > > > Thanks a lot for the explanation. It's

hbase/.archive doesn't exist

2013-03-25 Thread Jian Fang
Hi,

I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps printing
the following message:

2013-03-25 19:32:15,469 DEBUG
org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and
gc'd 0 unreferenced parent region(s)
2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
10.2.42.55:9000/hbase/.archive doesn't exist
2013-03-25 19:34:05,682 DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://
10.2.42.55:9000/hbase/.archive doesn't exist

Is there anything wrong here?

Thanks,

Jian


RE: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Liyin Tang
Hi Enis,
Good ideas ! And hbase community is driving on these 2 items.
1) [HBASE-7404]: L1/L2 block cache
2) [HBASE-5263] Preserving cached data on compactions through cache-on-write 

Thanks a lot
Liyin

From: Enis Söztutar [enis@gmail.com]
Sent: Monday, March 25, 2013 11:24 AM
To: hbase-user
Cc: lars hofhansl
Subject: Re: Does HBase RegionServer benefit from OS Page Cache

Thanks Liyin for sharing your use cases.

Related to those, I was thinking of two improvements:
 - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
in its block cache, failing over the compressed one if decompressed one
gets evicted. With very large heaps, maybe keeping around the compressed
blocks in a secondary cache makes sense?
 - A compaction will trash the cache. But maybe we can track keyvalues
(inside cached blocks are cached) for the files in the compaction, and mark
the blocks of the resulting compacted file which contain previously cached
keyvalues to be cached after the compaction. I have to research the
feasibility of this approach.

Enis


On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang  wrote:

> Block cache is for uncompressed data while OS page contains the compressed
> data. Unless the request pattern is full-table sequential scan, the block
> cache is still quite useful. I think the size of the block cache should be
> the amont of hot data we want to retain within a compaction cycle, which is
> quite hard to estimate in some use cases.
>
>
> Thanks a lot
> Liyin
> 
> From: lars hofhansl [la...@apache.org]
> Sent: Saturday, March 23, 2013 10:20 PM
> To: user@hbase.apache.org
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> Interesting.
>
> > 2) The blocks in the block cache will be naturally invalid quickly after
> the compactions.
>
> Should one keep the block cache small in order to increase the OS page
> cache?
>
> Does you data suggest we should not use the block cache at all?
>
>
> Thanks.
>
> -- Lars
>
>
>
> 
>  From: Liyin Tang 
> To: user@hbase.apache.org
> Sent: Saturday, March 23, 2013 9:44 PM
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> We (Facebook) are closely monitoring the OS page cache hit ratio in the
> production environments. My experience is if your data access pattern is
> very random, then the OS page cache won't help you so much even though the
> data locality is very high. On the other hand, if the requests are always
> against the recent data points, then the page cache hit ratio could be much
> higher.
>
> Actually, there are lots of optimizations could be done in HDFS. For
> example, we are working on fadvice away the 2nd/3rd replicated data from OS
> page cache so that it potentially could improve your OS page cache by 3X.
> Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
> the region server could keep more hot data in OS page cache based on the
> read access pattern.
>
> Another separate point is that we probably should NOT reply on the
> memstore/block cache to keep hot data. 1) The more data in the memstore,
> the more data the region server need to recovery from the server failures.
> So the tradeoff is the recovery time. 2) The blocks in the block cache will
> be naturally invalid quickly after the compactions. So region server
> probably won't be benefit from large JVM size at all.
>
> Thanks a lot
> Liyin
>
> On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu  wrote:
>
> > Coming up is the following enhancement which would make MSLAB even
> better:
> >
> > HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB
> >
> > FYI
> >
> > On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta  > >wrote:
> >
> > > Thanks a lot for the explanation. It's good to know that MSlab is
> stable
> > > and safe to enable (we don't have it enable right now, we're using
> 0.92).
> > > This would allow us to more freely allocate memory to HBase. I really
> > > enjoyed the depth of explanation from both Enis and J-D. I was indeed
> > > mistakenly referring to HFile as HLog, fortunately you were still able
> > > understand my question.
> > >
> > > Thanks,
> > > Pankaj
> > > On Mar 21, 2013, at 1:28 PM, Enis Söztutar  wrote:
> > >
> > > > I think the page cache is not totally useless, but as long as you can
> > > > control the GC, you should prefer the block cache. Some of the
> reasons
> > of
> > > > the top of my head:
> > > > - In case of a cache hit, for OS cache, you have to go through the DN
> > > > layer (an RPC if ssr disabled), and do a kernel jump, and read using
> > the
> > > > read() libc vs  for reading a block from the block cache, only the
> > HBase
> > > > process is involved. There is no process switch involved and no
> kernel
> > > > jumps.
> > > > - The read access path is optimized per hfile block. FS page
> boundaries
> > > > and hfile block boundaries are not aligned at all.
> > > >

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
bq. These records are being written using batch mutation with thrift API
This is an important information, I think.

Batch mutation through Java API would incur lower overhead.

On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
wrote:

> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> appreciate it.
>
> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> distributed across these columns.
>
> Jean, some columns are storing as small as a single byte value, while few
> of the columns are storing as much as 80-125 bytes of data. The overall
> record size is 1.5 KB. These records are being written using batch mutation
> with thrift API, where in we are writing 100 records per batch mutation.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
> Sent: Monday, March 25, 2013 11:57 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> I just ran some LoadTest to see if I can reproduce that.
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> -num_keys 100
> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> -num_keys 100
>
> This one crashed because I don't have enought disk space, so I'm
> re-running it, but just before it crashed it was showing about 24.5
> slower. which is coherent since it's writing 25 more columns.
>
> What size of data do you have? Big cells? Small cells? I will retry
> the test above with more lines and keep you posted.
>
> 2013/3/25 Pankaj Misra :
> > Yes Ted, you are right, we are having table regions pre-split, and we
> see that both regions are almost evenly filled in both the tests.
> >
> > This does not seem to be a regression though, since we were getting good
> write rates when we had lesser number of columns.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > 
> > From: Ted Yu [yuzhih...@gmail.com]
> > Sent: Monday, March 25, 2013 11:15 PM
> > To: user@hbase.apache.org
> > Cc: ankitjainc...@gmail.com
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > Copying Ankit who raised the same question soon after Pankaj's initial
> > question.
> >
> > On one hand I wonder if this was a regression in 0.94.5 (though
> unlikely).
> >
> > Did the region servers receive (relatively) same write load for the
> second
> > test case ? I assume you have pre-split your tables in both cases.
> >
> > Cheers
> >
> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> > wrote:
> >
> >> Hi Ted,
> >>
> >> Sorry for missing that detail, we are using HBase version 0.94.5
> >>
> >> Regards
> >> Pankaj Misra
> >>
> >>
> >> 
> >> From: Ted Yu [yuzhih...@gmail.com]
> >> Sent: Monday, March 25, 2013 10:29 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> If you give us the version of HBase you're using, that would give us
> some
> >> more information to help you.
> >>
> >> Cheers
> >>
> >> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
> pankaj.mi...@impetus.co.in
> >> >wrote:
> >>
> >> > Hi,
> >> >
> >> > The issue that I am facing is around the performance drop of Hbase,
> when
> >> I
> >> > was having 20 columns in a column family Vs now when I am having 40
> >> columns
> >> > in a column family. The number of columns have doubled and the
> >> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> >> data
> >> > per row across 40 columns.
> >> >
> >> > Are there any settings that I should look into for tweaking Hbase to
> >> write
> >> > higher number of columns faster?
> >> >
> >> > I would request community's help to let me know how can I write to a
> >> > column family with large number of columns efficiently.
> >> >
> >> > Would greatly appreciate any help /clues around this issue.
> >> >
> >> > Thanks and Regards
> >> > Pankaj Misra
> >> >
> >> > 
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > NOTE: This message may contain information that is confidential,
> >> > proprietary, privileged or otherwise protected by law. The message is
> >> > intended solely for the named addressee. If received in error, please
> >> > destroy and notify the sender. Any use of this email is prohibited
> when
> >> > received in error. Impetus does not represent, warrant and/or
> guarantee,
> >> > that the integrity of this communication has been maintained nor that
> the
> >> > communication is free of errors, virus, interception or interference.
> >> >
> >>
> >> 
> >>
> >>
> >>
> >>
> >>
> >>
> >> NOTE: This message may contain information that is confidential,
> >> proprietary, privi

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Mohammad Tariq
Hello Pankaj,

 What is the configuration which you are using?Also, the H/W specs?
Maybe tuning some of these would make things faster. Although amount
of data being inserted is small, the amount of metadata being generated
would be higher. Now, you have to generate the key+qualifier+TS triplet
for 40 cells, against 20 as in the earlier case.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Mar 26, 2013 at 12:10 AM, Pankaj Misra
wrote:

> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> appreciate it.
>
> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> distributed across these columns.
>
> Jean, some columns are storing as small as a single byte value, while few
> of the columns are storing as much as 80-125 bytes of data. The overall
> record size is 1.5 KB. These records are being written using batch mutation
> with thrift API, where in we are writing 100 records per batch mutation.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
> Sent: Monday, March 25, 2013 11:57 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> I just ran some LoadTest to see if I can reproduce that.
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> -num_keys 100
> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> -num_keys 100
>
> This one crashed because I don't have enought disk space, so I'm
> re-running it, but just before it crashed it was showing about 24.5
> slower. which is coherent since it's writing 25 more columns.
>
> What size of data do you have? Big cells? Small cells? I will retry
> the test above with more lines and keep you posted.
>
> 2013/3/25 Pankaj Misra :
> > Yes Ted, you are right, we are having table regions pre-split, and we
> see that both regions are almost evenly filled in both the tests.
> >
> > This does not seem to be a regression though, since we were getting good
> write rates when we had lesser number of columns.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > 
> > From: Ted Yu [yuzhih...@gmail.com]
> > Sent: Monday, March 25, 2013 11:15 PM
> > To: user@hbase.apache.org
> > Cc: ankitjainc...@gmail.com
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > Copying Ankit who raised the same question soon after Pankaj's initial
> > question.
> >
> > On one hand I wonder if this was a regression in 0.94.5 (though
> unlikely).
> >
> > Did the region servers receive (relatively) same write load for the
> second
> > test case ? I assume you have pre-split your tables in both cases.
> >
> > Cheers
> >
> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> > wrote:
> >
> >> Hi Ted,
> >>
> >> Sorry for missing that detail, we are using HBase version 0.94.5
> >>
> >> Regards
> >> Pankaj Misra
> >>
> >>
> >> 
> >> From: Ted Yu [yuzhih...@gmail.com]
> >> Sent: Monday, March 25, 2013 10:29 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> If you give us the version of HBase you're using, that would give us
> some
> >> more information to help you.
> >>
> >> Cheers
> >>
> >> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
> pankaj.mi...@impetus.co.in
> >> >wrote:
> >>
> >> > Hi,
> >> >
> >> > The issue that I am facing is around the performance drop of Hbase,
> when
> >> I
> >> > was having 20 columns in a column family Vs now when I am having 40
> >> columns
> >> > in a column family. The number of columns have doubled and the
> >> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> >> data
> >> > per row across 40 columns.
> >> >
> >> > Are there any settings that I should look into for tweaking Hbase to
> >> write
> >> > higher number of columns faster?
> >> >
> >> > I would request community's help to let me know how can I write to a
> >> > column family with large number of columns efficiently.
> >> >
> >> > Would greatly appreciate any help /clues around this issue.
> >> >
> >> > Thanks and Regards
> >> > Pankaj Misra
> >> >
> >> > 
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > NOTE: This message may contain information that is confidential,
> >> > proprietary, privileged or otherwise protected by law. The message is
> >> > intended solely for the named addressee. If received in error, please
> >> > destroy and notify the sender. Any use of this email is prohibited
> when
> >> > received in error. Impetus does not represent, warrant and/or
> guarantee,
> >> > that the integrity of this communication has been maintained nor that
> the
> >> > c

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Andrew Purtell
> With very large heaps, maybe keeping around the compressed blocks in a
secondary cache makes sense?

That's an interesting idea.

> A compaction will trash the cache. But maybe we can track keyvalues (inside
cached blocks are cached) for the files in the compaction, and mark the
blocks of the resulting compacted file which contain previously cached
keyvalues
to be cached after the compaction.

With very large heaps and a GC that can handle them (perhaps the G1 GC),
another option which might be worth experimenting with is a value (KV)
cache independent of the block cache which could be enabled on a per-table
basis. This would not be trashed by compaction, though we'd need to do some
additional housekeeping to evict deleted cells from the value cache, and
could be useful if collectively RAM on the cluster is sufficient to hold
the whole working set in memory (for the selected tables).


On Mon, Mar 25, 2013 at 7:24 PM, Enis Söztutar  wrote:

> Thanks Liyin for sharing your use cases.
>
> Related to those, I was thinking of two improvements:
>  - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
> in its block cache, failing over the compressed one if decompressed one
> gets evicted. With very large heaps, maybe keeping around the compressed
> blocks in a secondary cache makes sense?
>  - A compaction will trash the cache. But maybe we can track keyvalues
> (inside cached blocks are cached) for the files in the compaction, and mark
> the blocks of the resulting compacted file which contain previously cached
> keyvalues to be cached after the compaction. I have to research the
> feasibility of this approach.
>
> Enis
>
>
> On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang  wrote:
>
> > Block cache is for uncompressed data while OS page contains the
> compressed
> > data. Unless the request pattern is full-table sequential scan, the block
> > cache is still quite useful. I think the size of the block cache should
> be
> > the amont of hot data we want to retain within a compaction cycle, which
> is
> > quite hard to estimate in some use cases.
> >
> >
> > Thanks a lot
> > Liyin
> > 
> > From: lars hofhansl [la...@apache.org]
> > Sent: Saturday, March 23, 2013 10:20 PM
> > To: user@hbase.apache.org
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > Interesting.
> >
> > > 2) The blocks in the block cache will be naturally invalid quickly
> after
> > the compactions.
> >
> > Should one keep the block cache small in order to increase the OS page
> > cache?
> >
> > Does you data suggest we should not use the block cache at all?
> >
> >
> > Thanks.
> >
> > -- Lars
> >
> >
> >
> > 
> >  From: Liyin Tang 
> > To: user@hbase.apache.org
> > Sent: Saturday, March 23, 2013 9:44 PM
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > We (Facebook) are closely monitoring the OS page cache hit ratio in the
> > production environments. My experience is if your data access pattern is
> > very random, then the OS page cache won't help you so much even though
> the
> > data locality is very high. On the other hand, if the requests are always
> > against the recent data points, then the page cache hit ratio could be
> much
> > higher.
> >
> > Actually, there are lots of optimizations could be done in HDFS. For
> > example, we are working on fadvice away the 2nd/3rd replicated data from
> OS
> > page cache so that it potentially could improve your OS page cache by 3X.
> > Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
> > the region server could keep more hot data in OS page cache based on the
> > read access pattern.
> >
> > Another separate point is that we probably should NOT reply on the
> > memstore/block cache to keep hot data. 1) The more data in the memstore,
> > the more data the region server need to recovery from the server
> failures.
> > So the tradeoff is the recovery time. 2) The blocks in the block cache
> will
> > be naturally invalid quickly after the compactions. So region server
> > probably won't be benefit from large JVM size at all.
> >
> > Thanks a lot
> > Liyin
> >
> > On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu  wrote:
> >
> > > Coming up is the following enhancement which would make MSLAB even
> > better:
> > >
> > > HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using
> MSLAB
> > >
> > > FYI
> > >
> > > On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta  > > >wrote:
> > >
> > > > Thanks a lot for the explanation. It's good to know that MSlab is
> > stable
> > > > and safe to enable (we don't have it enable right now, we're using
> > 0.92).
> > > > This would allow us to more freely allocate memory to HBase. I really
> > > > enjoyed the depth of explanation from both Enis and J-D. I was indeed
> > > > mistakenly referring to HFile as HLog, fortunately you were still
> able
> > > > understand my question.
> > > >
> > > > Thanks,
> > > 

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Firstly, Thanks a lot Jean and Ted for your extended help, very much appreciate 
it.

Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is 
distributed across these columns.

Jean, some columns are storing as small as a single byte value, while few of 
the columns are storing as much as 80-125 bytes of data. The overall record 
size is 1.5 KB. These records are being written using batch mutation with 
thrift API, where in we are writing 100 records per batch mutation.

Thanks and Regards
Pankaj Misra



From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
Sent: Monday, March 25, 2013 11:57 PM
To: user@hbase.apache.org
Subject: Re: HBase Writes With Large Number of Columns

I just ran some LoadTest to see if I can reproduce that.

bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
-num_keys 100
13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1

bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
-num_keys 100

This one crashed because I don't have enought disk space, so I'm
re-running it, but just before it crashed it was showing about 24.5
slower. which is coherent since it's writing 25 more columns.

What size of data do you have? Big cells? Small cells? I will retry
the test above with more lines and keep you posted.

2013/3/25 Pankaj Misra :
> Yes Ted, you are right, we are having table regions pre-split, and we see 
> that both regions are almost evenly filled in both the tests.
>
> This does not seem to be a regression though, since we were getting good 
> write rates when we had lesser number of columns.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Monday, March 25, 2013 11:15 PM
> To: user@hbase.apache.org
> Cc: ankitjainc...@gmail.com
> Subject: Re: HBase Writes With Large Number of Columns
>
> Copying Ankit who raised the same question soon after Pankaj's initial
> question.
>
> On one hand I wonder if this was a regression in 0.94.5 (though unlikely).
>
> Did the region servers receive (relatively) same write load for the second
> test case ? I assume you have pre-split your tables in both cases.
>
> Cheers
>
> On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> wrote:
>
>> Hi Ted,
>>
>> Sorry for missing that detail, we are using HBase version 0.94.5
>>
>> Regards
>> Pankaj Misra
>>
>>
>> 
>> From: Ted Yu [yuzhih...@gmail.com]
>> Sent: Monday, March 25, 2013 10:29 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Writes With Large Number of Columns
>>
>> If you give us the version of HBase you're using, that would give us some
>> more information to help you.
>>
>> Cheers
>>
>> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra > >wrote:
>>
>> > Hi,
>> >
>> > The issue that I am facing is around the performance drop of Hbase, when
>> I
>> > was having 20 columns in a column family Vs now when I am having 40
>> columns
>> > in a column family. The number of columns have doubled and the
>> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
>> data
>> > per row across 40 columns.
>> >
>> > Are there any settings that I should look into for tweaking Hbase to
>> write
>> > higher number of columns faster?
>> >
>> > I would request community's help to let me know how can I write to a
>> > column family with large number of columns efficiently.
>> >
>> > Would greatly appreciate any help /clues around this issue.
>> >
>> > Thanks and Regards
>> > Pankaj Misra
>> >
>> > 
>> >
>> >
>> >
>> >
>> >
>> >
>> > NOTE: This message may contain information that is confidential,
>> > proprietary, privileged or otherwise protected by law. The message is
>> > intended solely for the named addressee. If received in error, please
>> > destroy and notify the sender. Any use of this email is prohibited when
>> > received in error. Impetus does not represent, warrant and/or guarantee,
>> > that the integrity of this communication has been maintained nor that the
>> > communication is free of errors, virus, interception or interference.
>> >
>>
>> 
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confid

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Jean-Marc Spaggiari
I just ran some LoadTest to see if I can reproduce that.

bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
-num_keys 100
13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1

bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
-num_keys 100

This one crashed because I don't have enought disk space, so I'm
re-running it, but just before it crashed it was showing about 24.5
slower. which is coherent since it's writing 25 more columns.

What size of data do you have? Big cells? Small cells? I will retry
the test above with more lines and keep you posted.

2013/3/25 Pankaj Misra :
> Yes Ted, you are right, we are having table regions pre-split, and we see 
> that both regions are almost evenly filled in both the tests.
>
> This does not seem to be a regression though, since we were getting good 
> write rates when we had lesser number of columns.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Monday, March 25, 2013 11:15 PM
> To: user@hbase.apache.org
> Cc: ankitjainc...@gmail.com
> Subject: Re: HBase Writes With Large Number of Columns
>
> Copying Ankit who raised the same question soon after Pankaj's initial
> question.
>
> On one hand I wonder if this was a regression in 0.94.5 (though unlikely).
>
> Did the region servers receive (relatively) same write load for the second
> test case ? I assume you have pre-split your tables in both cases.
>
> Cheers
>
> On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> wrote:
>
>> Hi Ted,
>>
>> Sorry for missing that detail, we are using HBase version 0.94.5
>>
>> Regards
>> Pankaj Misra
>>
>>
>> 
>> From: Ted Yu [yuzhih...@gmail.com]
>> Sent: Monday, March 25, 2013 10:29 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Writes With Large Number of Columns
>>
>> If you give us the version of HBase you're using, that would give us some
>> more information to help you.
>>
>> Cheers
>>
>> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra > >wrote:
>>
>> > Hi,
>> >
>> > The issue that I am facing is around the performance drop of Hbase, when
>> I
>> > was having 20 columns in a column family Vs now when I am having 40
>> columns
>> > in a column family. The number of columns have doubled and the
>> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
>> data
>> > per row across 40 columns.
>> >
>> > Are there any settings that I should look into for tweaking Hbase to
>> write
>> > higher number of columns faster?
>> >
>> > I would request community's help to let me know how can I write to a
>> > column family with large number of columns efficiently.
>> >
>> > Would greatly appreciate any help /clues around this issue.
>> >
>> > Thanks and Regards
>> > Pankaj Misra
>> >
>> > 
>> >
>> >
>> >
>> >
>> >
>> >
>> > NOTE: This message may contain information that is confidential,
>> > proprietary, privileged or otherwise protected by law. The message is
>> > intended solely for the named addressee. If received in error, please
>> > destroy and notify the sender. Any use of this email is prohibited when
>> > received in error. Impetus does not represent, warrant and/or guarantee,
>> > that the integrity of this communication has been maintained nor that the
>> > communication is free of errors, virus, interception or interference.
>> >
>>
>> 
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.


Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
Final clarification:
bq. I am writing 1.5 KB of data per row across 40 columns.

So your schema is not sparse - you were writing to (all) 40 columns in
second case.

Thanks

On Mon, Mar 25, 2013 at 11:03 AM, Pankaj Misra
wrote:

> Yes Ted, you are right, we are having table regions pre-split, and we see
> that both regions are almost evenly filled in both the tests.
>
> This does not seem to be a regression though, since we were getting good
> write rates when we had lesser number of columns.
>
> Thanks and Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Monday, March 25, 2013 11:15 PM
> To: user@hbase.apache.org
> Cc: ankitjainc...@gmail.com
> Subject: Re: HBase Writes With Large Number of Columns
>
> Copying Ankit who raised the same question soon after Pankaj's initial
> question.
>
> On one hand I wonder if this was a regression in 0.94.5 (though unlikely).
>
> Did the region servers receive (relatively) same write load for the second
> test case ? I assume you have pre-split your tables in both cases.
>
> Cheers
>
> On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> wrote:
>
> > Hi Ted,
> >
> > Sorry for missing that detail, we are using HBase version 0.94.5
> >
> > Regards
> > Pankaj Misra
> >
> >
> > 
> > From: Ted Yu [yuzhih...@gmail.com]
> > Sent: Monday, March 25, 2013 10:29 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > If you give us the version of HBase you're using, that would give us some
> > more information to help you.
> >
> > Cheers
> >
> > On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
> pankaj.mi...@impetus.co.in
> > >wrote:
> >
> > > Hi,
> > >
> > > The issue that I am facing is around the performance drop of Hbase,
> when
> > I
> > > was having 20 columns in a column family Vs now when I am having 40
> > columns
> > > in a column family. The number of columns have doubled and the
> > > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> > data
> > > per row across 40 columns.
> > >
> > > Are there any settings that I should look into for tweaking Hbase to
> > write
> > > higher number of columns faster?
> > >
> > > I would request community's help to let me know how can I write to a
> > > column family with large number of columns efficiently.
> > >
> > > Would greatly appreciate any help /clues around this issue.
> > >
> > > Thanks and Regards
> > > Pankaj Misra
> > >
> > > 
> > >
> > >
> > >
> > >
> > >
> > >
> > > NOTE: This message may contain information that is confidential,
> > > proprietary, privileged or otherwise protected by law. The message is
> > > intended solely for the named addressee. If received in error, please
> > > destroy and notify the sender. Any use of this email is prohibited when
> > > received in error. Impetus does not represent, warrant and/or
> guarantee,
> > > that the integrity of this communication has been maintained nor that
> the
> > > communication is free of errors, virus, interception or interference.
> > >
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Enis Söztutar
Thanks Liyin for sharing your use cases.

Related to those, I was thinking of two improvements:
 - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
in its block cache, failing over the compressed one if decompressed one
gets evicted. With very large heaps, maybe keeping around the compressed
blocks in a secondary cache makes sense?
 - A compaction will trash the cache. But maybe we can track keyvalues
(inside cached blocks are cached) for the files in the compaction, and mark
the blocks of the resulting compacted file which contain previously cached
keyvalues to be cached after the compaction. I have to research the
feasibility of this approach.

Enis


On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang  wrote:

> Block cache is for uncompressed data while OS page contains the compressed
> data. Unless the request pattern is full-table sequential scan, the block
> cache is still quite useful. I think the size of the block cache should be
> the amont of hot data we want to retain within a compaction cycle, which is
> quite hard to estimate in some use cases.
>
>
> Thanks a lot
> Liyin
> 
> From: lars hofhansl [la...@apache.org]
> Sent: Saturday, March 23, 2013 10:20 PM
> To: user@hbase.apache.org
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> Interesting.
>
> > 2) The blocks in the block cache will be naturally invalid quickly after
> the compactions.
>
> Should one keep the block cache small in order to increase the OS page
> cache?
>
> Does you data suggest we should not use the block cache at all?
>
>
> Thanks.
>
> -- Lars
>
>
>
> 
>  From: Liyin Tang 
> To: user@hbase.apache.org
> Sent: Saturday, March 23, 2013 9:44 PM
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> We (Facebook) are closely monitoring the OS page cache hit ratio in the
> production environments. My experience is if your data access pattern is
> very random, then the OS page cache won't help you so much even though the
> data locality is very high. On the other hand, if the requests are always
> against the recent data points, then the page cache hit ratio could be much
> higher.
>
> Actually, there are lots of optimizations could be done in HDFS. For
> example, we are working on fadvice away the 2nd/3rd replicated data from OS
> page cache so that it potentially could improve your OS page cache by 3X.
> Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
> the region server could keep more hot data in OS page cache based on the
> read access pattern.
>
> Another separate point is that we probably should NOT reply on the
> memstore/block cache to keep hot data. 1) The more data in the memstore,
> the more data the region server need to recovery from the server failures.
> So the tradeoff is the recovery time. 2) The blocks in the block cache will
> be naturally invalid quickly after the compactions. So region server
> probably won't be benefit from large JVM size at all.
>
> Thanks a lot
> Liyin
>
> On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu  wrote:
>
> > Coming up is the following enhancement which would make MSLAB even
> better:
> >
> > HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB
> >
> > FYI
> >
> > On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta  > >wrote:
> >
> > > Thanks a lot for the explanation. It's good to know that MSlab is
> stable
> > > and safe to enable (we don't have it enable right now, we're using
> 0.92).
> > > This would allow us to more freely allocate memory to HBase. I really
> > > enjoyed the depth of explanation from both Enis and J-D. I was indeed
> > > mistakenly referring to HFile as HLog, fortunately you were still able
> > > understand my question.
> > >
> > > Thanks,
> > > Pankaj
> > > On Mar 21, 2013, at 1:28 PM, Enis Söztutar  wrote:
> > >
> > > > I think the page cache is not totally useless, but as long as you can
> > > > control the GC, you should prefer the block cache. Some of the
> reasons
> > of
> > > > the top of my head:
> > > > - In case of a cache hit, for OS cache, you have to go through the DN
> > > > layer (an RPC if ssr disabled), and do a kernel jump, and read using
> > the
> > > > read() libc vs  for reading a block from the block cache, only the
> > HBase
> > > > process is involved. There is no process switch involved and no
> kernel
> > > > jumps.
> > > > - The read access path is optimized per hfile block. FS page
> boundaries
> > > > and hfile block boundaries are not aligned at all.
> > > > - There is very little control to the page cache to cache / not cache
> > > > based on expected access patterns. For example, we can mark META
> region
> > > > blocks, and some column families, and hfile index blocks always
> cached
> > or
> > > > cached with high priority. Also, for full table scans, we can
> > explicitly
> > > > disable block caching to not trash the current working set. With OS
> > page
> > > > cac

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Yes Ted, you are right, we are having table regions pre-split, and we see that 
both regions are almost evenly filled in both the tests.

This does not seem to be a regression though, since we were getting good write 
rates when we had lesser number of columns.

Thanks and Regards
Pankaj Misra



From: Ted Yu [yuzhih...@gmail.com]
Sent: Monday, March 25, 2013 11:15 PM
To: user@hbase.apache.org
Cc: ankitjainc...@gmail.com
Subject: Re: HBase Writes With Large Number of Columns

Copying Ankit who raised the same question soon after Pankaj's initial
question.

On one hand I wonder if this was a regression in 0.94.5 (though unlikely).

Did the region servers receive (relatively) same write load for the second
test case ? I assume you have pre-split your tables in both cases.

Cheers

On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
wrote:

> Hi Ted,
>
> Sorry for missing that detail, we are using HBase version 0.94.5
>
> Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Monday, March 25, 2013 10:29 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> If you give us the version of HBase you're using, that would give us some
> more information to help you.
>
> Cheers
>
> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra  >wrote:
>
> > Hi,
> >
> > The issue that I am facing is around the performance drop of Hbase, when
> I
> > was having 20 columns in a column family Vs now when I am having 40
> columns
> > in a column family. The number of columns have doubled and the
> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> data
> > per row across 40 columns.
> >
> > Are there any settings that I should look into for tweaking Hbase to
> write
> > higher number of columns faster?
> >
> > I would request community's help to let me know how can I write to a
> > column family with large number of columns efficiently.
> >
> > Would greatly appreciate any help /clues around this issue.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
Copying Ankit who raised the same question soon after Pankaj's initial
question.

On one hand I wonder if this was a regression in 0.94.5 (though unlikely).

Did the region servers receive (relatively) same write load for the second
test case ? I assume you have pre-split your tables in both cases.

Cheers

On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
wrote:

> Hi Ted,
>
> Sorry for missing that detail, we are using HBase version 0.94.5
>
> Regards
> Pankaj Misra
>
>
> 
> From: Ted Yu [yuzhih...@gmail.com]
> Sent: Monday, March 25, 2013 10:29 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Writes With Large Number of Columns
>
> If you give us the version of HBase you're using, that would give us some
> more information to help you.
>
> Cheers
>
> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra  >wrote:
>
> > Hi,
> >
> > The issue that I am facing is around the performance drop of Hbase, when
> I
> > was having 20 columns in a column family Vs now when I am having 40
> columns
> > in a column family. The number of columns have doubled and the
> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
> data
> > per row across 40 columns.
> >
> > Are there any settings that I should look into for tweaking Hbase to
> write
> > higher number of columns faster?
> >
> > I would request community's help to let me know how can I write to a
> > column family with large number of columns efficiently.
> >
> > Would greatly appreciate any help /clues around this issue.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Hi Ted,

Sorry for missing that detail, we are using HBase version 0.94.5

Regards
Pankaj Misra



From: Ted Yu [yuzhih...@gmail.com]
Sent: Monday, March 25, 2013 10:29 PM
To: user@hbase.apache.org
Subject: Re: HBase Writes With Large Number of Columns

If you give us the version of HBase you're using, that would give us some
more information to help you.

Cheers

On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra wrote:

> Hi,
>
> The issue that I am facing is around the performance drop of Hbase, when I
> was having 20 columns in a column family Vs now when I am having 40 columns
> in a column family. The number of columns have doubled and the
> ingestion/write speed has also dropped by half. I am writing 1.5 KB of data
> per row across 40 columns.
>
> Are there any settings that I should look into for tweaking Hbase to write
> higher number of columns faster?
>
> I would request community's help to let me know how can I write to a
> column family with large number of columns efficiently.
>
> Would greatly appreciate any help /clues around this issue.
>
> Thanks and Regards
> Pankaj Misra
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Getting less write throughput due to more number of columns

2013-03-25 Thread Ankit Jain
Hi All,

I am writing a records into HBase. I ran the performance test on following
two cases:

Set1: Input record contains 26 columns and record size is 2Kb.

Set2: Input record contain 1 column and record size is 2Kb.

In second case I am getting 8MBps more performance than step.

are the large number of columns have any impact on write performance and If
yes, how we can overcome it.

-- 
Thanks,
Ankit Jain


Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
If you give us the version of HBase you're using, that would give us some
more information to help you.

Cheers

On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra wrote:

> Hi,
>
> The issue that I am facing is around the performance drop of Hbase, when I
> was having 20 columns in a column family Vs now when I am having 40 columns
> in a column family. The number of columns have doubled and the
> ingestion/write speed has also dropped by half. I am writing 1.5 KB of data
> per row across 40 columns.
>
> Are there any settings that I should look into for tweaking Hbase to write
> higher number of columns faster?
>
> I would request community's help to let me know how can I write to a
> column family with large number of columns efficiently.
>
> Would greatly appreciate any help /clues around this issue.
>
> Thanks and Regards
> Pankaj Misra
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Hi,

The issue that I am facing is around the performance drop of Hbase, when I was 
having 20 columns in a column family Vs now when I am having 40 columns in a 
column family. The number of columns have doubled and the ingestion/write speed 
has also dropped by half. I am writing 1.5 KB of data per row across 40 columns.

Are there any settings that I should look into for tweaking Hbase to write 
higher number of columns faster?

I would request community's help to let me know how can I write to a column 
family with large number of columns efficiently.

Would greatly appreciate any help /clues around this issue.

Thanks and Regards
Pankaj Misra








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: HBase M/R with M/R and HBase not on same cluster

2013-03-25 Thread Michael Segel
Just out of curiosity...

Why do you want to run the job on Cluster A that reads from Cluster B but 
writes to Cluster A? 

Wouldn't it be easier to run the job on Cluster B and inside the Mapper.setup() 
you create your own configuration for your second cluster for output? 


On Mar 24, 2013, at 7:49 AM, David Koch  wrote:

> Hello J-D,
> 
> Thanks, it was instructive to look at the source. However, I am now stuck
> with getting HBase to honor the "hbase.mapred.output.quorum" setting. I
> opened a separate topic for this.
> 
> Regards,
> 
> /David
> 
> 
> On Mon, Mar 18, 2013 at 11:26 PM, Jean-Daniel Cryans 
> wrote:
> 
>> Checkout how CopyTable does it:
>> 
>> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java
>> 
>> J-D
>> 
>> On Mon, Mar 18, 2013 at 3:09 PM, David Koch  wrote:
>>> Hello,
>>> 
>>> Is it possible to run a M/R on cluster A over a table that resides on
>>> cluster B with output to a table on cluster A? If so, how?
>>> 
>>> I am interested in doing this for the purpose of copying part of a table
>>> from B to A. Cluster B is a production environment, cluster A is a slow
>>> test platform. I do not want the M/R to run on B since it would block
>>> precious slots on this cluster. Otherwise I could just run CopyTable on
>>> cluster B and specify cluster A as output quorum.
>>> 
>>> Could this work by pointing the client configuration at the
>> mapred-site.xml
>>> of cluster A and the hdfs-site.xml and hbase-site.xml of cluster B? In
>> this
>>> scenario - in order to output to cluster A I guess I'd have to set
>>> TableOutputFormat.QUORUM_ADDRESS to cluster A.
>>> 
>>> I use a client configuration generated by CDH4 and there are some other
>>> files floating around - such as core-site.xml, not sure what to do with
>>> that.
>>> 
>>> Thank you,
>>> 
>>> /David
>> 



Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Michael Segel
I think the problem is that Wei has been reading some stuff in blogs and that's 
why he has such a large region size to start with. 

So if he manually splits the logs, drops the region size to something more 
appropriate... 

Or if he unloads the table, drops the table, recreates the table with a smaller 
more reasonable region size... reloads...  he'd be better off. 


On Mar 25, 2013, at 6:20 AM, Jean-Marc Spaggiari  
wrote:

> Hi Wei,
> 
> Have you looked at MAX_FILESIZE? If your table is 1TB size, and you
> have 10 RS and want 12 regions per server, you can setup this to
> 1TB/(10x12) and you will get at least all those regions (and even a
> bit more).
> 
> JM
> 
> 2013/3/25 Lu, Wei :
>> We are facing big region size but small  region number of a table. 10 region 
>> servers, each has only one region with size over 10G, map slot of each task 
>> tracker is 12. We are planning to ‘split’ start/stop key range of large 
>> table regions for more map tasks, so that we can better make usage of 
>> mapreduce resource (currently only one of 12 map slot is used). I have some 
>> ideas below to split, please give me comments or advice.
>> We are considering of implementing a TableInputFormat that optimized the 
>> method:
>> @Override
>> public List getSplits(JobContext context) throws IOException
>> Following is a idea:
>> 
>> 1)  Split start/stop key range based on threshold or avg. of region size
>> Set a threshold t1; collect each region’s size, if region size is larger 
>> than region size, then ‘split’ the range [startkey, stopkey) of the region, 
>> to N = {region size} / t1 sub-ranges: [startkey, stopkey1), [stopkey1, 
>> stopkey2),….,[stopkeyN-1, stopkey);
>> As for  t1, we could set as we like, or leave it as the average size of all 
>> region size. We will set it to be a small value when each region size is 
>> very large, so that ‘split’ will happen;
>> 
>> 2)  Get split key by sampling hfile block keys
>> As for  the stopkey1, …stopkeyN-1, hbase doesn’t supply apis to get them and 
>> only Pair getStartEndKeys()is given to get start/stop key 
>> of the region. 1) We could do calculate to roughly get them, or 2) we can 
>> directly get each store file’s block key through Hfile.Reader and merge sort 
>> them. Then we can do sampling.
>> Does this method make sense?
>> 
>> Thanks,
>> Wei
>> 
> 



Re: java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Ted Yu
What version of HBase are you using ?

Did you enable short circuit read in hadoop ?

Thanks

On Mon, Mar 25, 2013 at 4:52 AM, Dhanasekaran Anbalagan
wrote:

> Hi Guys,
>
> I have problem with Hbase server it's says java.lang.OutOfMemoryError:
> Direct buffer memory
> I new to Hbase How to solve this issue.
>
> This is my stake trace
> http://paste.ubuntu.com/5646088/
>
>
> -Dhanasekaran.
> Did I learn something today? If not, I wasted it.
>


RE: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Liyin Tang
Block cache is for uncompressed data while OS page contains the compressed 
data. Unless the request pattern is full-table sequential scan, the block cache 
is still quite useful. I think the size of the block cache should be the amont 
of hot data we want to retain within a compaction cycle, which is quite hard to 
estimate in some use cases. 


Thanks a lot
Liyin

From: lars hofhansl [la...@apache.org]
Sent: Saturday, March 23, 2013 10:20 PM
To: user@hbase.apache.org
Subject: Re: Does HBase RegionServer benefit from OS Page Cache

Interesting.

> 2) The blocks in the block cache will be naturally invalid quickly after the 
> compactions.

Should one keep the block cache small in order to increase the OS page cache?

Does you data suggest we should not use the block cache at all?


Thanks.

-- Lars




 From: Liyin Tang 
To: user@hbase.apache.org
Sent: Saturday, March 23, 2013 9:44 PM
Subject: Re: Does HBase RegionServer benefit from OS Page Cache

We (Facebook) are closely monitoring the OS page cache hit ratio in the
production environments. My experience is if your data access pattern is
very random, then the OS page cache won't help you so much even though the
data locality is very high. On the other hand, if the requests are always
against the recent data points, then the page cache hit ratio could be much
higher.

Actually, there are lots of optimizations could be done in HDFS. For
example, we are working on fadvice away the 2nd/3rd replicated data from OS
page cache so that it potentially could improve your OS page cache by 3X.
Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
the region server could keep more hot data in OS page cache based on the
read access pattern.

Another separate point is that we probably should NOT reply on the
memstore/block cache to keep hot data. 1) The more data in the memstore,
the more data the region server need to recovery from the server failures.
So the tradeoff is the recovery time. 2) The blocks in the block cache will
be naturally invalid quickly after the compactions. So region server
probably won't be benefit from large JVM size at all.

Thanks a lot
Liyin

On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu  wrote:

> Coming up is the following enhancement which would make MSLAB even better:
>
> HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB
>
> FYI
>
> On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta  >wrote:
>
> > Thanks a lot for the explanation. It's good to know that MSlab is stable
> > and safe to enable (we don't have it enable right now, we're using 0.92).
> > This would allow us to more freely allocate memory to HBase. I really
> > enjoyed the depth of explanation from both Enis and J-D. I was indeed
> > mistakenly referring to HFile as HLog, fortunately you were still able
> > understand my question.
> >
> > Thanks,
> > Pankaj
> > On Mar 21, 2013, at 1:28 PM, Enis Söztutar  wrote:
> >
> > > I think the page cache is not totally useless, but as long as you can
> > > control the GC, you should prefer the block cache. Some of the reasons
> of
> > > the top of my head:
> > > - In case of a cache hit, for OS cache, you have to go through the DN
> > > layer (an RPC if ssr disabled), and do a kernel jump, and read using
> the
> > > read() libc vs  for reading a block from the block cache, only the
> HBase
> > > process is involved. There is no process switch involved and no kernel
> > > jumps.
> > > - The read access path is optimized per hfile block. FS page boundaries
> > > and hfile block boundaries are not aligned at all.
> > > - There is very little control to the page cache to cache / not cache
> > > based on expected access patterns. For example, we can mark META region
> > > blocks, and some column families, and hfile index blocks always cached
> or
> > > cached with high priority. Also, for full table scans, we can
> explicitly
> > > disable block caching to not trash the current working set. With OS
> page
> > > cache, you do not have this control.
> > >
> > > Enis
> > >
> > >
> > > On Wed, Mar 20, 2013 at 10:30 AM, Jean-Daniel Cryans <
> > jdcry...@apache.org>wrote:
> > >
> > >> First, MSLAB has been enabled by default since 0.92.0 as it was deemed
> > >> stable enough. So, unless you are on 0.90, you are already using it.
> > >>
> > >> Also, I'm not sure why you are referencing the HLog in your first
> > >> paragraph in the context of reading from disk, because the HLogs are
> > >> rarely read (only on recovery). Maybe you meant HFile?
> > >>
> > >> In any case, your email covers most arguments except for one:
> > >> checksumming. Retrieving a block from HDFS, even when using short
> > >> circuit reads to go directly to the OS instead of passing through the
> > >> DN, will take quite a bit more time than reading directly from the
> > >> block cache. This is why even if you disable block caching on a family
> > >> that the index and root 

java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Dhanasekaran Anbalagan
Hi Guys,

I have problem with Hbase server it's says java.lang.OutOfMemoryError:
Direct buffer memory
I new to Hbase How to solve this issue.

This is my stake trace
http://paste.ubuntu.com/5646088/


-Dhanasekaran.
Did I learn something today? If not, I wasted it.


Re: hbase increments and hadoop attempts

2013-03-25 Thread Bryan Beaudreault
Increments are not idempotent, so yes you will double increment the set of
increments that succeeded in the first attempt(s).  If you care about that
you're better off not using the Increment interface and instead having 2
jobs: one that does a Get of the current value and adds the offset then
passes through to the next job.  The next job would simply save the locally
incremented values with a series of Puts.  This way you can re-run either
job any number of times without double-incrementing anything.


On Mon, Mar 25, 2013 at 10:17 AM, prakash kadel wrote:

> hi everyone,
>   when i launch my mapreduce jobs to increment counters in hbase i
> sometimes have maps
> with multiple attempts like:
> attempt_201303251722_0161_m_74_0
> attempt_201303251722_0161_m_74_1
>
> if there are multiple attempts running and if the first one gets
> completed successful,
> what will happen to the increments made by the second attempt
>
> will there be unnecessary increments due to the second attempt running?
> thank you.
>
> Sincerely
> Prakash
>


hbase increments and hadoop attempts

2013-03-25 Thread prakash kadel
hi everyone,
  when i launch my mapreduce jobs to increment counters in hbase i
sometimes have maps
with multiple attempts like:
attempt_201303251722_0161_m_74_0
attempt_201303251722_0161_m_74_1

if there are multiple attempts running and if the first one gets
completed successful,
what will happen to the increments made by the second attempt

will there be unnecessary increments due to the second attempt running?
thank you.

Sincerely
Prakash


Re: Compaction timing and recovery from failure

2013-03-25 Thread ramkrishna vasudevan
My question is this.  If a compaction fails due to a regionserver loss
mid-compaction, does the regionserver that picks up the region continue
where the first left off?  Or does it have to start from scratch?
-> The answer to this is, it works from the beginning again.

Regards
Ram

On Mon, Mar 25, 2013 at 7:33 PM, Brennon Church  wrote:

> Everyone,
>
> I recently had a couple compactions, minors that were promoted to majors,
> take 8 and 10 minutes each.  I eventually killed the regionserver
> underneath them as I'd never seen compactions last that long before.  In
> looking through the logs from the regionserver that was killed and watching
> one of the regions after it was moved over, I saw that it took about 3
> minutes to compact on the second regionserver.  I also noticed that the
> temporary location for the newly compacted storfile matched in both the
> first (failed/killed) and second (succeeded) regionserver log.
>
> My question is this.  If a compaction fails due to a regionserver loss
> mid-compaction, does the regionserver that picks up the region continue
> where the first left off?  Or does it have to start from scratch?
>
> Basically, I'm wondering if waiting an additional 3 minutes or so would
> have finally worked through the region on the first server, or if it was
> truly stuck for some other, unknown reason.
>
> Thanks!
>
> --Brennon
>
>


Compaction timing and recovery from failure

2013-03-25 Thread Brennon Church

Everyone,

I recently had a couple compactions, minors that were promoted to 
majors, take 8 and 10 minutes each.  I eventually killed the 
regionserver underneath them as I'd never seen compactions last that 
long before.  In looking through the logs from the regionserver that was 
killed and watching one of the regions after it was moved over, I saw 
that it took about 3 minutes to compact on the second regionserver.  I 
also noticed that the temporary location for the newly compacted 
storfile matched in both the first (failed/killed) and second 
(succeeded) regionserver log.


My question is this.  If a compaction fails due to a regionserver loss 
mid-compaction, does the regionserver that picks up the region continue 
where the first left off?  Or does it have to start from scratch?


Basically, I'm wondering if waiting an additional 3 minutes or so would 
have finally worked through the region on the first server, or if it was 
truly stuck for some other, unknown reason.


Thanks!

--Brennon



Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Jean-Marc Spaggiari
Hi Wei,

Have you looked at MAX_FILESIZE? If your table is 1TB size, and you
have 10 RS and want 12 regions per server, you can setup this to
1TB/(10x12) and you will get at least all those regions (and even a
bit more).

JM

2013/3/25 Lu, Wei :
> We are facing big region size but small  region number of a table. 10 region 
> servers, each has only one region with size over 10G, map slot of each task 
> tracker is 12. We are planning to ‘split’ start/stop key range of large table 
> regions for more map tasks, so that we can better make usage of mapreduce 
> resource (currently only one of 12 map slot is used). I have some ideas below 
> to split, please give me comments or advice.
> We are considering of implementing a TableInputFormat that optimized the 
> method:
> @Override
> public List getSplits(JobContext context) throws IOException
> Following is a idea:
>
> 1)  Split start/stop key range based on threshold or avg. of region size
> Set a threshold t1; collect each region’s size, if region size is larger than 
> region size, then ‘split’ the range [startkey, stopkey) of the region, to N = 
> {region size} / t1 sub-ranges: [startkey, stopkey1), [stopkey1, 
> stopkey2),….,[stopkeyN-1, stopkey);
> As for  t1, we could set as we like, or leave it as the average size of all 
> region size. We will set it to be a small value when each region size is very 
> large, so that ‘split’ will happen;
>
> 2)  Get split key by sampling hfile block keys
> As for  the stopkey1, …stopkeyN-1, hbase doesn’t supply apis to get them and 
> only Pair getStartEndKeys()is given to get start/stop key 
> of the region. 1) We could do calculate to roughly get them, or 2) we can 
> directly get each store file’s block key through Hfile.Reader and merge sort 
> them. Then we can do sampling.
> Does this method make sense?
>
> Thanks,
> Wei
>


Re: Evenly splitting the table

2013-03-25 Thread Michael Segel
@Aaron, 

You said you're using a salt, which would imply that your number is random and 
not derived from the base key.  (Where base key is the key prior to being 
hashed. )

Is that the case, or do you mean that Kiji is taking the first two bytes of the 
hash and prepending it to the key? 


On Mar 20, 2013, at 6:55 PM, Aaron Kimball  wrote:

> Hi Cole,
> 
> How are your keys structured? In Kiji, we default to using hashed row keys
> where each key starts with two bytes of salt. This makes it a lot easier to
> pre-split the table since you can make stronger guarantees about the key
> distribution.
> 
> If your keys are "raw" text like, say, plaintext email addresses, it is
> significantly more difficult to guess the right splits a priori.
> 
> cheers,
> - Aaron
> 
> 
> 
> On Wed, Mar 20, 2013 at 3:43 PM, Ted Yu  wrote:
> 
>> Take a look at TestAdmin#testCreateTableRPCTimeOut() where
>> hbaseadmin.createTable() is called.
>> 
>> bq. Is there a way to go about splitting the entire table without having
>> specific start and end keys?
>> 
>> I don't think so.
>> 
>> On Wed, Mar 20, 2013 at 3:32 PM, Cole  wrote:
>> 
>>> I was wondering how I can go about evenly splitting an entire table in
>>> HBase during table creation[1]. I tried providing the empty byte arrays
>>> HConstants.EMPTY_START_ROW and HConstants.EMPTY_END_ROW
>>> as parameters to the method I linked below, and got an error: "Start
>>> key must be smaller than end key". Is there a way to go about splitting
>>> the entire table without having specific start and end keys? Thanks in
>>> advance.
>>> 
>>> 
>>> [1]
>>> 
>>> 
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
>>> #createTable(org.apache.hadoop.hbase.HTableDescriptor, byte[], byte[],
>> int)
>>> 
>>> 
>> 



‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Lu, Wei
We are facing big region size but small  region number of a table. 10 region 
servers, each has only one region with size over 10G, map slot of each task 
tracker is 12. We are planning to ‘split’ start/stop key range of large table 
regions for more map tasks, so that we can better make usage of mapreduce 
resource (currently only one of 12 map slot is used). I have some ideas below 
to split, please give me comments or advice.
We are considering of implementing a TableInputFormat that optimized the method:
@Override
public List getSplits(JobContext context) throws IOException
Following is a idea:

1)  Split start/stop key range based on threshold or avg. of region size
Set a threshold t1; collect each region’s size, if region size is larger than 
region size, then ‘split’ the range [startkey, stopkey) of the region, to N = 
{region size} / t1 sub-ranges: [startkey, stopkey1), [stopkey1, 
stopkey2),….,[stopkeyN-1, stopkey);
As for  t1, we could set as we like, or leave it as the average size of all 
region size. We will set it to be a small value when each region size is very 
large, so that ‘split’ will happen;

2)  Get split key by sampling hfile block keys
As for  the stopkey1, …stopkeyN-1, hbase doesn’t supply apis to get them and 
only Pair getStartEndKeys()is given to get start/stop key of 
the region. 1) We could do calculate to roughly get them, or 2) we can directly 
get each store file’s block key through Hfile.Reader and merge sort them. Then 
we can do sampling.
Does this method make sense?

Thanks,
Wei



region servers becomes dead after sometime below is log of zookeeper

2013-03-25 Thread gaurhari dass
client /107.108.188.11:38371
2013-03-25 12:09:55,213 INFO
org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
connection from /107.108.188.11:38373
2013-03-25 12:09:55,214 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to establish new session at /107.108.188.11:38373
2013-03-25 12:09:55,228 INFO org.apache.zookeeper.server.ZooKeeperServer:
Established session 0x13da02ddb8b000d with negotiated timeout 18 for
client /107.108.188.11:38373
2013-03-25 12:09:55,283 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x13da02ddb8b000d
2013-03-25 12:09:55,299 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38373 which had
sessionid 0x13da02ddb8b000d
2013-03-25 12:10:05,956 WARN org.apache.zookeeper.server.NIOServerCnxn:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x13da02ddb8b000c, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:679)
2013-03-25 12:10:05,956 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38371 which had
sessionid 0x13da02ddb8b000c
2013-03-25 12:12:57,000 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x13da02ddb8b000c, timeout of 18ms exceeded
2013-03-25 12:12:57,000 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x13da02ddb8b000c
2013-03-25 12:13:10,220 INFO
org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
connection from /107.108.188.11:38377
2013-03-25 12:13:10,222 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to establish new session at /107.108.188.11:38377
2013-03-25 12:13:10,242 INFO org.apache.zookeeper.server.ZooKeeperServer:
Established session 0x13da02ddb8b000e with negotiated timeout 18 for
client /107.108.188.11:38377
2013-03-25 12:13:10,443 INFO
org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
connection from /107.108.188.11:38379
2013-03-25 12:13:10,445 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to establish new session at /107.108.188.11:38379
2013-03-25 12:13:10,454 INFO org.apache.zookeeper.server.ZooKeeperServer:
Established session 0x13da02ddb8b000f with negotiated timeout 18 for
client /107.108.188.11:38379
2013-03-25 12:13:10,524 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x13da02ddb8b000f
2013-03-25 12:13:10,542 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38379 which had
sessionid 0x13da02ddb8b000f
2013-03-25 12:13:16,326 WARN org.apache.zookeeper.server.NIOServerCnxn:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x13da02ddb8b000e, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:679)
2013-03-25 12:13:16,327 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38377 which had
sessionid 0x13da02ddb8b000e
2013-03-25 12:13:31,584 WARN org.apache.zookeeper.server.NIOServerCnxn:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x13da02ddb8b000a, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:679)
2013-03-25 12:13:31,585 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38362 which had
sessionid 0x13da02ddb8b000a
2013-03-25 12:13:31,585 WARN org.apache.zookeeper.server.NIOServerCnxn:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x13da02ddb8b0008, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:679)
2013-03-25 12:13:31,586 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /107.108.188.11:38358 which had
sessionid 0x13da02ddb8b0008
2013-03-25 12:15:42,000 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x13da02ddb8b0008, timeout of 18ms exceeded
2013-03-25 12:15:42,001 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x13da02d