Re: Error about rs block seek

2013-05-13 Thread Bing Jiang
Hi,all
Before the exception stack, there is an Error log:
2013-05-13 00:00:14,491 ERROR
org.apache.hadoop.hbase.io.hfile.HFileReaderV2: Current pos = 32651;
currKeyLen = 45; currValLen = 80; block limit = 32775; HFile name =
1f96183d55144c058fa2a05fe5c0b814; currBlock currBlockOffset = 33550830

And the operation is scanner's next.
Current pos + currKeyLen + currValLen  block limit
32651+45 +80 = 32776  32775 , and in my table configs, set blocksize
32768, and when I change the value from blocksize from 64k(default value)
to 32k, so many error logs being found.

I use 0.94.3, can someone tell me the influence of blocksize setting.

Tks.




2013/5/13 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com

 Your TTL is negative here 'TTL = '-1','.

 Any reason for it to be negative? This could be a possible reason.  Not
 sure..

 Regards
 Ram


 On Mon, May 13, 2013 at 7:20 AM, Bing Jiang jiangbinglo...@gmail.com
 wrote:

  hi, Ted.
 
  No data block encoding, our table config below:
 
  User Table Description
  CrawlInfohttp://10.100.12.33:8003/table.jsp?name=CrawlInfo {NAME
  = 'CrawlInfo', DEFERRED_LOG_FLUSH = 'true', MAX_FILESIZE =
  '34359738368', FAMILIES = [{NAME = 'CrawlStats', BLOOMFILTER =
 'ROWCOL',
  CACHE_INDEX_ON_WRITE = 'true', TTL = '-1', CACHE_DATA_ON_WRITE =
 'true',
  CACHE_BLOOMS_ON_WRITE = 'true', VERSIONS = '1', BLOCKSIZE = '32768'}]}
 
 
 
  2013/5/13 Bing Jiang jiangbinglo...@gmail.com
 
   Hi, JM.
   Our jdk version is 1.6.0_38
  
  
   2013/5/13 Jean-Marc Spaggiari jean-m...@spaggiari.org
  
   Hi Bing,
  
   Which JDK are you using?
  
   Thanks,
  
   JM
  
   2013/5/12 Bing Jiang jiangbinglo...@gmail.com
  
Yes, we use hbase-0.94.3 , and  we change block.size from 64k to
 32k.
   
   
2013/5/13 Ted Yu yuzhih...@gmail.com
   
 Can you tell us the version of hbase you are using ?
 Did this problem happen recently ?

 Thanks

 On May 12, 2013, at 6:25 PM, Bing Jiang jiangbinglo...@gmail.com
 
wrote:

  Hi, all.
  In our hbase cluster, there are many logs like below:
 
  2013-05-13 00:00:04,161 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
  java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:216)
  at

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:882)
  at

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:753)
  at

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:487)
  at

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131)
  at

  org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2073)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3412)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1642)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1634)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
  at

 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4230)
  at

 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4204)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2025)
  at

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3461)
  at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown
   Source)
  at

   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at

   
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at

   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 
 
 
  and Table config:
 
 
  Can anyone tell me how I can find the reason about this?
 
  --
  Bing Jiang
  weibo: http://weibo.com/jiangbinglover
  BLOG: http://blog.sina.com.cn/jiangbinglover
  National Research Center for Intelligent Computing Systems
  Institute of Computing technology
  Graduate University of Chinese Academy of Science


Block size of HBase files

2013-05-13 Thread Praveen Bysani
Hi,

I have the dfs.block.size value set to 1 GB in my cluster configuration. I
have around 250 GB of data stored in hbase over this cluster. But when i
check the number of blocks, it doesn't correspond to the block size value i
set. From what i understand i should only have ~250 blocks. But instead
when i did a fsck on the /hbase/table-name, i got the following

Status: HEALTHY
 Total size:265727504820 B
 Total dirs:1682
 Total files:   1459
 Total blocks (validated):  1459 (avg. block size 182129886 B)
 Minimally replicated blocks:   1459 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  5
 Number of racks:   1

Are there any other configuration parameters that need to be set ?

-- 
Regards,
Praveen Bysani
http://www.praveenbysani.com


Re: Block size of HBase files

2013-05-13 Thread Amandeep Khurana
On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani praveen.ii...@gmail.comwrote:

 Hi,

 I have the dfs.block.size value set to 1 GB in my cluster configuration.


Just out of curiosity - why do you have it set at 1GB?


 I
 have around 250 GB of data stored in hbase over this cluster. But when i
 check the number of blocks, it doesn't correspond to the block size value i
 set. From what i understand i should only have ~250 blocks. But instead
 when i did a fsck on the /hbase/table-name, i got the following

 Status: HEALTHY
  Total size:265727504820 B
  Total dirs:1682
  Total files:   1459
  Total blocks (validated):  1459 (avg. block size 182129886 B)
  Minimally replicated blocks:   1459 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0
  Corrupt blocks:0
  Missing replicas:  0 (0.0 %)
  Number of data-nodes:  5
  Number of racks:   1

 Are there any other configuration parameters that need to be set ?


What is your HFile size set to? The HFiles that get persisted would be
bound by that number. Thereafter each HFile would be split into blocks, the
size of which you configure using the dfs.block.size configuration
parameter.



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com



RE: How to implement this check put and then update something logic?

2013-05-13 Thread Liu, Raymond
Well, this did come from a graph domain.

However, I think this could be a common problem when you need to update 
something according to the original value where a simple checkAndPut on single 
value won't work.

Another example, if you want to implement something like UPDATE, you want to 
know whether this is a new value inserted, or update to an old value. It won't 
be easy now,
You need to checkAndPut on null, if not null, then you get the value and 
checkAndPut on that value, since you want to make sure the column is still 
there. If it fails , you loop back from check null.

So I think a little bit of enhancement on current HBASE atomic operation could 
greatly improve the usability upon similar problems. Or maybe there are already 
solution for this type of issue?

 
 Maybe this problem is more in the graph domain? I know that there are
 projects aimed at representing graphs at large scale better. I'm saying this
 since you have one ID referencing another ID (using target ID).
 
 
 
 On May 10, 2013, at 11:47 AM, Liu, Raymond raymond@intel.com
 wrote:
 
  Thanks, seems there are no other better solution?
 
  Really need a GetAndPut atomic op here ...
 
 
  You can do this by looping over a checkAndPut operation until it succeeds.
 
  -Mike
 
  On Thu, May 9, 2013 at 8:52 PM, Liu, Raymond raymond@intel.com
  wrote:
  Any suggestion?
 
 
  Hi
 
   Say, I have four field for one record :id, status, targetid, and 
  count.
   Status is on and off, target could reference other id, and
  count will record the number of on status for all targetid from same 
  id.
 
   The record could be add / delete, or updated to change the status.
 
   I could put count in another table, or put it in the same
  table, it doesn't matter. As long as it can work.
 
   My question is how can I ensure its correctness of the count
  field when run with multiple client update the table concurrently?
 
   The closet thing I can think of is checkAndPut, but I will
  need two steps to find out the change of count, since checkAndPut
  etc can only test a single value and with EQUAL comparator, thus I
  can only check upon null firstly, then on or off. Thus when thing
  change during this two step, I need to retry from first step until
  it succeed. This
  could be bad when a lot of concurrent op is on going.
 
   And then, I need to update count by checkAndIncrement, though
  if the above problem could be solved, the order of -1 +1 might not
  be important for the final result, but in some intermediate time,
  it might not reflect the real count of that time.
 
   I know this kind of transaction is not the target of HBASE,
  APP should take care of it, then , what's the best practice on
  this? Any quick simple solution for my problem? Client RowLock
  could solve this issue, But it seems to me that it is not safe and
  is not recommended and
  deprecated?
 
   Btw. Is that possible or practice to implement something like
  PutAndGet which put in new row and return the old row back to
  client been
  implemented?
  That would help a lot for my case.
 
  Best Regards,
  Raymond Liu
 



Re: Error about rs block seek

2013-05-13 Thread Anoop John
 Current pos = 32651;
currKeyLen = 45; currValLen = 80; block limit = 32775

This means after the cur position we need to have atleast  45+80+4(key
length stored as 4 bytes) +4(value length 4 bytes)
So atleast 32784 should have been the limit.  If we have memstoreTS also
written with this KV some more bytes..

Do u use Hbase handled checksum?

-Anoop-

On Mon, May 13, 2013 at 12:00 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 Hi,all
 Before the exception stack, there is an Error log:
 2013-05-13 00:00:14,491 ERROR
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2: Current pos = 32651;
 currKeyLen = 45; currValLen = 80; block limit = 32775; HFile name =
 1f96183d55144c058fa2a05fe5c0b814; currBlock currBlockOffset = 33550830

 And the operation is scanner's next.
 Current pos + currKeyLen + currValLen  block limit
 32651+45 +80 = 32776  32775 , and in my table configs, set blocksize
 32768, and when I change the value from blocksize from 64k(default value)
 to 32k, so many error logs being found.

 I use 0.94.3, can someone tell me the influence of blocksize setting.

 Tks.




 2013/5/13 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com

  Your TTL is negative here 'TTL = '-1','.
 
  Any reason for it to be negative? This could be a possible reason.  Not
  sure..
 
  Regards
  Ram
 
 
  On Mon, May 13, 2013 at 7:20 AM, Bing Jiang jiangbinglo...@gmail.com
  wrote:
 
   hi, Ted.
  
   No data block encoding, our table config below:
  
   User Table Description
   CrawlInfohttp://10.100.12.33:8003/table.jsp?name=CrawlInfo {NAME
   = 'CrawlInfo', DEFERRED_LOG_FLUSH = 'true', MAX_FILESIZE =
   '34359738368', FAMILIES = [{NAME = 'CrawlStats', BLOOMFILTER =
  'ROWCOL',
   CACHE_INDEX_ON_WRITE = 'true', TTL = '-1', CACHE_DATA_ON_WRITE =
  'true',
   CACHE_BLOOMS_ON_WRITE = 'true', VERSIONS = '1', BLOCKSIZE =
 '32768'}]}
  
  
  
   2013/5/13 Bing Jiang jiangbinglo...@gmail.com
  
Hi, JM.
Our jdk version is 1.6.0_38
   
   
2013/5/13 Jean-Marc Spaggiari jean-m...@spaggiari.org
   
Hi Bing,
   
Which JDK are you using?
   
Thanks,
   
JM
   
2013/5/12 Bing Jiang jiangbinglo...@gmail.com
   
 Yes, we use hbase-0.94.3 , and  we change block.size from 64k to
  32k.


 2013/5/13 Ted Yu yuzhih...@gmail.com

  Can you tell us the version of hbase you are using ?
  Did this problem happen recently ?
 
  Thanks
 
  On May 12, 2013, at 6:25 PM, Bing Jiang 
 jiangbinglo...@gmail.com
  
 wrote:
 
   Hi, all.
   In our hbase cluster, there are many logs like below:
  
   2013-05-13 00:00:04,161 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
   java.lang.IllegalArgumentException
   at java.nio.Buffer.position(Buffer.java:216)
   at
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:882)
   at
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:753)
   at
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:487)
   at
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131)
   at
 
   org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2073)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3412)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1642)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1634)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
   at
 
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4230)
   at
 
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4204)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2025)
   at
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3461)
   at
 sun.reflect.GeneratedMethodAccessor30.invoke(Unknown
Source)
   at
 

   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
 

Re: Error about rs block seek

2013-05-13 Thread Bing Jiang
hi, Anoop.
I do not handle or change the hbase checksum.

So I want to know if I set block size at the beginning of creating tables,
does something make troubles?


2013/5/13 Anoop John anoop.hb...@gmail.com

  Current pos = 32651;
 currKeyLen = 45; currValLen = 80; block limit = 32775

 This means after the cur position we need to have atleast  45+80+4(key
 length stored as 4 bytes) +4(value length 4 bytes)
 So atleast 32784 should have been the limit.  If we have memstoreTS also
 written with this KV some more bytes..

 Do u use Hbase handled checksum?

 -Anoop-

 On Mon, May 13, 2013 at 12:00 PM, Bing Jiang jiangbinglo...@gmail.com
 wrote:

  Hi,all
  Before the exception stack, there is an Error log:
  2013-05-13 00:00:14,491 ERROR
  org.apache.hadoop.hbase.io.hfile.HFileReaderV2: Current pos = 32651;
  currKeyLen = 45; currValLen = 80; block limit = 32775; HFile name =
  1f96183d55144c058fa2a05fe5c0b814; currBlock currBlockOffset = 33550830
 
  And the operation is scanner's next.
  Current pos + currKeyLen + currValLen  block limit
  32651+45 +80 = 32776  32775 , and in my table configs, set blocksize
  32768, and when I change the value from blocksize from 64k(default value)
  to 32k, so many error logs being found.
 
  I use 0.94.3, can someone tell me the influence of blocksize setting.
 
  Tks.
 
 
 
 
  2013/5/13 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com
 
   Your TTL is negative here 'TTL = '-1','.
  
   Any reason for it to be negative? This could be a possible reason.  Not
   sure..
  
   Regards
   Ram
  
  
   On Mon, May 13, 2013 at 7:20 AM, Bing Jiang jiangbinglo...@gmail.com
   wrote:
  
hi, Ted.
   
No data block encoding, our table config below:
   
User Table Description
CrawlInfohttp://10.100.12.33:8003/table.jsp?name=CrawlInfo {NAME
= 'CrawlInfo', DEFERRED_LOG_FLUSH = 'true', MAX_FILESIZE =
'34359738368', FAMILIES = [{NAME = 'CrawlStats', BLOOMFILTER =
   'ROWCOL',
CACHE_INDEX_ON_WRITE = 'true', TTL = '-1', CACHE_DATA_ON_WRITE =
   'true',
CACHE_BLOOMS_ON_WRITE = 'true', VERSIONS = '1', BLOCKSIZE =
  '32768'}]}
   
   
   
2013/5/13 Bing Jiang jiangbinglo...@gmail.com
   
 Hi, JM.
 Our jdk version is 1.6.0_38


 2013/5/13 Jean-Marc Spaggiari jean-m...@spaggiari.org

 Hi Bing,

 Which JDK are you using?

 Thanks,

 JM

 2013/5/12 Bing Jiang jiangbinglo...@gmail.com

  Yes, we use hbase-0.94.3 , and  we change block.size from 64k to
   32k.
 
 
  2013/5/13 Ted Yu yuzhih...@gmail.com
 
   Can you tell us the version of hbase you are using ?
   Did this problem happen recently ?
  
   Thanks
  
   On May 12, 2013, at 6:25 PM, Bing Jiang 
  jiangbinglo...@gmail.com
   
  wrote:
  
Hi, all.
In our hbase cluster, there are many logs like below:
   
2013-05-13 00:00:04,161 ERROR
   org.apache.hadoop.hbase.regionserver.HRegionServer:
java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:216)
at
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:882)
at
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:753)
at
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:487)
at
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131)
at
  
   
 org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2073)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3412)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1642)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1634)
at
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
at
  
   org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4230)
at
  
   org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4204)
at
  
 

   
  
 
 

Re: Error about rs block seek

2013-05-13 Thread Anoop John
So I want to know if I set block size at the beginning of creating tables,
does something make troubles?

Should not. We have tested with diff block sizes from def 64K to 8K fro
testing purposes.  Have not came across issues like this.  Only on this
data it is coming or every time u create a new table with 32K as block size
and do some writes and then do read, this issue comes?

-Anoop-

On Mon, May 13, 2013 at 1:36 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 hi, Anoop.
 I do not handle or change the hbase checksum.

 So I want to know if I set block size at the beginning of creating tables,
 does something make troubles?


 2013/5/13 Anoop John anoop.hb...@gmail.com

   Current pos = 32651;
  currKeyLen = 45; currValLen = 80; block limit = 32775
 
  This means after the cur position we need to have atleast  45+80+4(key
  length stored as 4 bytes) +4(value length 4 bytes)
  So atleast 32784 should have been the limit.  If we have memstoreTS also
  written with this KV some more bytes..
 
  Do u use Hbase handled checksum?
 
  -Anoop-
 
  On Mon, May 13, 2013 at 12:00 PM, Bing Jiang jiangbinglo...@gmail.com
  wrote:
 
   Hi,all
   Before the exception stack, there is an Error log:
   2013-05-13 00:00:14,491 ERROR
   org.apache.hadoop.hbase.io.hfile.HFileReaderV2: Current pos = 32651;
   currKeyLen = 45; currValLen = 80; block limit = 32775; HFile name =
   1f96183d55144c058fa2a05fe5c0b814; currBlock currBlockOffset = 33550830
  
   And the operation is scanner's next.
   Current pos + currKeyLen + currValLen  block limit
   32651+45 +80 = 32776  32775 , and in my table configs, set blocksize
   32768, and when I change the value from blocksize from 64k(default
 value)
   to 32k, so many error logs being found.
  
   I use 0.94.3, can someone tell me the influence of blocksize setting.
  
   Tks.
  
  
  
  
   2013/5/13 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com
  
Your TTL is negative here 'TTL = '-1','.
   
Any reason for it to be negative? This could be a possible reason.
  Not
sure..
   
Regards
Ram
   
   
On Mon, May 13, 2013 at 7:20 AM, Bing Jiang 
 jiangbinglo...@gmail.com
wrote:
   
 hi, Ted.

 No data block encoding, our table config below:

 User Table Description
 CrawlInfohttp://10.100.12.33:8003/table.jsp?name=CrawlInfo {NAME
 = 'CrawlInfo', DEFERRED_LOG_FLUSH = 'true', MAX_FILESIZE =
 '34359738368', FAMILIES = [{NAME = 'CrawlStats', BLOOMFILTER =
'ROWCOL',
 CACHE_INDEX_ON_WRITE = 'true', TTL = '-1', CACHE_DATA_ON_WRITE =
'true',
 CACHE_BLOOMS_ON_WRITE = 'true', VERSIONS = '1', BLOCKSIZE =
   '32768'}]}



 2013/5/13 Bing Jiang jiangbinglo...@gmail.com

  Hi, JM.
  Our jdk version is 1.6.0_38
 
 
  2013/5/13 Jean-Marc Spaggiari jean-m...@spaggiari.org
 
  Hi Bing,
 
  Which JDK are you using?
 
  Thanks,
 
  JM
 
  2013/5/12 Bing Jiang jiangbinglo...@gmail.com
 
   Yes, we use hbase-0.94.3 , and  we change block.size from 64k
 to
32k.
  
  
   2013/5/13 Ted Yu yuzhih...@gmail.com
  
Can you tell us the version of hbase you are using ?
Did this problem happen recently ?
   
Thanks
   
On May 12, 2013, at 6:25 PM, Bing Jiang 
   jiangbinglo...@gmail.com

   wrote:
   
 Hi, all.
 In our hbase cluster, there are many logs like below:

 2013-05-13 00:00:04,161 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.lang.IllegalArgumentException
 at java.nio.Buffer.position(Buffer.java:216)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:882)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:753)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:487)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131)
 at
   

  org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2073)
 at
   
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3412)
 at
   
   

Re: Error about rs block seek

2013-05-13 Thread ramkrishna vasudevan
Is it possible to reproduce this with simple test case based on your
usecase and data? You can share it so that can really debug the actual
problem.
Regards
Ram


On Mon, May 13, 2013 at 1:57 PM, Anoop John anoop.hb...@gmail.com wrote:

 So I want to know if I set block size at the beginning of creating tables,
 does something make troubles?

 Should not. We have tested with diff block sizes from def 64K to 8K fro
 testing purposes.  Have not came across issues like this.  Only on this
 data it is coming or every time u create a new table with 32K as block size
 and do some writes and then do read, this issue comes?

 -Anoop-

 On Mon, May 13, 2013 at 1:36 PM, Bing Jiang jiangbinglo...@gmail.com
 wrote:

  hi, Anoop.
  I do not handle or change the hbase checksum.
 
  So I want to know if I set block size at the beginning of creating
 tables,
  does something make troubles?
 
 
  2013/5/13 Anoop John anoop.hb...@gmail.com
 
Current pos = 32651;
   currKeyLen = 45; currValLen = 80; block limit = 32775
  
   This means after the cur position we need to have atleast  45+80+4(key
   length stored as 4 bytes) +4(value length 4 bytes)
   So atleast 32784 should have been the limit.  If we have memstoreTS
 also
   written with this KV some more bytes..
  
   Do u use Hbase handled checksum?
  
   -Anoop-
  
   On Mon, May 13, 2013 at 12:00 PM, Bing Jiang jiangbinglo...@gmail.com
   wrote:
  
Hi,all
Before the exception stack, there is an Error log:
2013-05-13 00:00:14,491 ERROR
org.apache.hadoop.hbase.io.hfile.HFileReaderV2: Current pos = 32651;
currKeyLen = 45; currValLen = 80; block limit = 32775; HFile name =
1f96183d55144c058fa2a05fe5c0b814; currBlock currBlockOffset =
 33550830
   
And the operation is scanner's next.
Current pos + currKeyLen + currValLen  block limit
32651+45 +80 = 32776  32775 , and in my table configs, set blocksize
32768, and when I change the value from blocksize from 64k(default
  value)
to 32k, so many error logs being found.
   
I use 0.94.3, can someone tell me the influence of blocksize setting.
   
Tks.
   
   
   
   
2013/5/13 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com
   
 Your TTL is negative here 'TTL = '-1','.

 Any reason for it to be negative? This could be a possible reason.
   Not
 sure..

 Regards
 Ram


 On Mon, May 13, 2013 at 7:20 AM, Bing Jiang 
  jiangbinglo...@gmail.com
 wrote:

  hi, Ted.
 
  No data block encoding, our table config below:
 
  User Table Description
  CrawlInfohttp://10.100.12.33:8003/table.jsp?name=CrawlInfo
 {NAME
  = 'CrawlInfo', DEFERRED_LOG_FLUSH = 'true', MAX_FILESIZE =
  '34359738368', FAMILIES = [{NAME = 'CrawlStats', BLOOMFILTER =
 'ROWCOL',
  CACHE_INDEX_ON_WRITE = 'true', TTL = '-1', CACHE_DATA_ON_WRITE
 =
 'true',
  CACHE_BLOOMS_ON_WRITE = 'true', VERSIONS = '1', BLOCKSIZE =
'32768'}]}
 
 
 
  2013/5/13 Bing Jiang jiangbinglo...@gmail.com
 
   Hi, JM.
   Our jdk version is 1.6.0_38
  
  
   2013/5/13 Jean-Marc Spaggiari jean-m...@spaggiari.org
  
   Hi Bing,
  
   Which JDK are you using?
  
   Thanks,
  
   JM
  
   2013/5/12 Bing Jiang jiangbinglo...@gmail.com
  
Yes, we use hbase-0.94.3 , and  we change block.size from
 64k
  to
 32k.
   
   
2013/5/13 Ted Yu yuzhih...@gmail.com
   
 Can you tell us the version of hbase you are using ?
 Did this problem happen recently ?

 Thanks

 On May 12, 2013, at 6:25 PM, Bing Jiang 
jiangbinglo...@gmail.com
 
wrote:

  Hi, all.
  In our hbase cluster, there are many logs like below:
 
  2013-05-13 00:00:04,161 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
  java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:216)
  at

   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:882)
  at

   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:753)
  at

   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:487)
  at

   
  
 

   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
  at

   
  
 

   
  
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
  at

   
  
 

   
  
 
 

Re: Block size of HBase files

2013-05-13 Thread Praveen Bysani
Hi,

I wanted to minimize on the number of map reduce tasks generated while
processing a job, hence configured it to a larger value.

I don't think i have configured HFile size in the cluster. I use Cloudera
Manager to mange my cluster, and the only configuration i can relate
to is hfile.block.cache.size
which is set to 0.25. How do i change the HFile size ?

On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:

 On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani praveen.ii...@gmail.com
 wrote:

  Hi,
 
  I have the dfs.block.size value set to 1 GB in my cluster configuration.


 Just out of curiosity - why do you have it set at 1GB?


  I
  have around 250 GB of data stored in hbase over this cluster. But when i
  check the number of blocks, it doesn't correspond to the block size
 value i
  set. From what i understand i should only have ~250 blocks. But instead
  when i did a fsck on the /hbase/table-name, i got the following
 
  Status: HEALTHY
   Total size:265727504820 B
   Total dirs:1682
   Total files:   1459
   Total blocks (validated):  1459 (avg. block size 182129886 B)
   Minimally replicated blocks:   1459 (100.0 %)
   Over-replicated blocks:0 (0.0 %)
   Under-replicated blocks:   0 (0.0 %)
   Mis-replicated blocks: 0 (0.0 %)
   Default replication factor:3
   Average block replication: 3.0
   Corrupt blocks:0
   Missing replicas:  0 (0.0 %)
   Number of data-nodes:  5
   Number of racks:   1
 
  Are there any other configuration parameters that need to be set ?


 What is your HFile size set to? The HFiles that get persisted would be
 bound by that number. Thereafter each HFile would be split into blocks, the
 size of which you configure using the dfs.block.size configuration
 parameter.


 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com
 




-- 
Regards,
Praveen Bysani
http://www.praveenbysani.com


Re: Block size of HBase files

2013-05-13 Thread Anoop John
Praveen,

How many regions there in ur table and how and CFs?
Under /hbase/table-name there will be many files and dir u will be able
to see. There will be .tableinfo file and every region will have
.regionInfo file and then under cf the data file (HFiles) .  Your total
data is 250GB. When your block size is 1GB and u have only one file of
250GB, then what you are looking for makes sense. But it is not the case
with HBase data storage.

HFiles are created per CF per region.  Also as data comes in (writes), by
default after 128mb HBase will flush it as a file into HDFS. So making a
file in HDFS with 1 block.(In ur case)  Later these smaller files will get
merged into bigger one .(Compaction)  At the time when u checked, some
major compactions were run? Major compaction will merge all files under a
CF within a region to one HFile .  So if u have 100 regions and 2 CFs for
table,after major compaction you will be having 200 HFiles. (Remember under
/hbase/table-name some other files also you will be able to see other
than the HFiles.)

The #files and avg block size displayed below speaks it.(Why u have those
many blocks)

The HFile size Amandeep was refering is the max size for an HFile (And thus
for a region).  If you keep on writing data to a region and when the data
size crosses this max size, HBase will split that region into 2.

Can you try checking the files count and blocks count after running a major
compaction?

What MR job u r trying to run with HBase? Also why you run MR directly on
the HFiles?  When you run the MR job over HBase (Like Scan using MR) it is
not the #files or blocks which decides the #mappers.  It will be based on
the #regions in the table..

-Anoop-

On Mon, May 13, 2013 at 3:15 PM, Praveen Bysani praveen.ii...@gmail.comwrote:

 Hi,

 I wanted to minimize on the number of map reduce tasks generated while
 processing a job, hence configured it to a larger value.

 I don't think i have configured HFile size in the cluster. I use Cloudera
 Manager to mange my cluster, and the only configuration i can relate
 to is hfile.block.cache.size
 which is set to 0.25. How do i change the HFile size ?

 On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:

  On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani 
 praveen.ii...@gmail.com
  wrote:
 
   Hi,
  
   I have the dfs.block.size value set to 1 GB in my cluster
 configuration.
 
 
  Just out of curiosity - why do you have it set at 1GB?
 
 
   I
   have around 250 GB of data stored in hbase over this cluster. But when
 i
   check the number of blocks, it doesn't correspond to the block size
  value i
   set. From what i understand i should only have ~250 blocks. But instead
   when i did a fsck on the /hbase/table-name, i got the following
  
   Status: HEALTHY
Total size:265727504820 B
Total dirs:1682
Total files:   1459
Total blocks (validated):  1459 (avg. block size 182129886 B)
Minimally replicated blocks:   1459 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 3.0
Corrupt blocks:0
Missing replicas:  0 (0.0 %)
Number of data-nodes:  5
Number of racks:   1
  
   Are there any other configuration parameters that need to be set ?
 
 
  What is your HFile size set to? The HFiles that get persisted would be
  bound by that number. Thereafter each HFile would be split into blocks,
 the
  size of which you configure using the dfs.block.size configuration
  parameter.
 
 
  
   --
   Regards,
   Praveen Bysani
   http://www.praveenbysani.com
  
 



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com



Re: Block size of HBase files

2013-05-13 Thread Ted Yu
You can change HFile size through hbase.hregion.max.filesize parameter. 

On May 13, 2013, at 2:45 AM, Praveen Bysani praveen.ii...@gmail.com wrote:

 Hi,
 
 I wanted to minimize on the number of map reduce tasks generated while
 processing a job, hence configured it to a larger value.
 
 I don't think i have configured HFile size in the cluster. I use Cloudera
 Manager to mange my cluster, and the only configuration i can relate
 to is hfile.block.cache.size
 which is set to 0.25. How do i change the HFile size ?
 
 On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:
 
 On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani praveen.ii...@gmail.com
 wrote:
 
 Hi,
 
 I have the dfs.block.size value set to 1 GB in my cluster configuration.
 
 
 Just out of curiosity - why do you have it set at 1GB?
 
 
 I
 have around 250 GB of data stored in hbase over this cluster. But when i
 check the number of blocks, it doesn't correspond to the block size
 value i
 set. From what i understand i should only have ~250 blocks. But instead
 when i did a fsck on the /hbase/table-name, i got the following
 
 Status: HEALTHY
 Total size:265727504820 B
 Total dirs:1682
 Total files:   1459
 Total blocks (validated):  1459 (avg. block size 182129886 B)
 Minimally replicated blocks:   1459 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  5
 Number of racks:   1
 
 Are there any other configuration parameters that need to be set ?
 
 
 What is your HFile size set to? The HFiles that get persisted would be
 bound by that number. Thereafter each HFile would be split into blocks, the
 size of which you configure using the dfs.block.size configuration
 parameter.
 
 
 
 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com
 
 
 
 -- 
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com


Re: Block size of HBase files

2013-05-13 Thread Praveen Bysani
Hi,

Thanks for the details. No i haven't run any compaction or i have no idea
if there is one going on in background. I executed a major_compact on that
table  and i now have 731 regions (each about ~350 mb !!). I checked the
configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
too !!!

I am not trying to access HFiles in my MR job, infact i am just using a PIG
script which handles this. This number (731) is close to my number of map
tasks, which makes sense. But how can i decrease this, shouldn't the size
of each region be 1 GB with that configuration value ?


On 13 May 2013 18:36, Ted Yu yuzhih...@gmail.com wrote:

 You can change HFile size through hbase.hregion.max.filesize parameter.

 On May 13, 2013, at 2:45 AM, Praveen Bysani praveen.ii...@gmail.com
 wrote:

  Hi,
 
  I wanted to minimize on the number of map reduce tasks generated while
  processing a job, hence configured it to a larger value.
 
  I don't think i have configured HFile size in the cluster. I use Cloudera
  Manager to mange my cluster, and the only configuration i can relate
  to is hfile.block.cache.size
  which is set to 0.25. How do i change the HFile size ?
 
  On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:
 
  On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani 
 praveen.ii...@gmail.com
  wrote:
 
  Hi,
 
  I have the dfs.block.size value set to 1 GB in my cluster
 configuration.
 
 
  Just out of curiosity - why do you have it set at 1GB?
 
 
  I
  have around 250 GB of data stored in hbase over this cluster. But when
 i
  check the number of blocks, it doesn't correspond to the block size
  value i
  set. From what i understand i should only have ~250 blocks. But instead
  when i did a fsck on the /hbase/table-name, i got the following
 
  Status: HEALTHY
  Total size:265727504820 B
  Total dirs:1682
  Total files:   1459
  Total blocks (validated):  1459 (avg. block size 182129886 B)
  Minimally replicated blocks:   1459 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0
  Corrupt blocks:0
  Missing replicas:  0 (0.0 %)
  Number of data-nodes:  5
  Number of racks:   1
 
  Are there any other configuration parameters that need to be set ?
 
 
  What is your HFile size set to? The HFiles that get persisted would be
  bound by that number. Thereafter each HFile would be split into blocks,
 the
  size of which you configure using the dfs.block.size configuration
  parameter.
 
 
 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com
 
 
 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com




-- 
Regards,
Praveen Bysani
http://www.praveenbysani.com


Re: Block size of HBase files

2013-05-13 Thread Anoop John
now have 731 regions (each about ~350 mb !!). I checked the
configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
too !!!

You mentioned the splits at the time of table creation?  How u created the
table?

-Anoop-

On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani praveen.ii...@gmail.comwrote:

 Hi,

 Thanks for the details. No i haven't run any compaction or i have no idea
 if there is one going on in background. I executed a major_compact on that
 table  and i now have 731 regions (each about ~350 mb !!). I checked the
 configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
 too !!!

 I am not trying to access HFiles in my MR job, infact i am just using a PIG
 script which handles this. This number (731) is close to my number of map
 tasks, which makes sense. But how can i decrease this, shouldn't the size
 of each region be 1 GB with that configuration value ?


 On 13 May 2013 18:36, Ted Yu yuzhih...@gmail.com wrote:

  You can change HFile size through hbase.hregion.max.filesize parameter.
 
  On May 13, 2013, at 2:45 AM, Praveen Bysani praveen.ii...@gmail.com
  wrote:
 
   Hi,
  
   I wanted to minimize on the number of map reduce tasks generated while
   processing a job, hence configured it to a larger value.
  
   I don't think i have configured HFile size in the cluster. I use
 Cloudera
   Manager to mange my cluster, and the only configuration i can relate
   to is hfile.block.cache.size
   which is set to 0.25. How do i change the HFile size ?
  
   On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:
  
   On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani 
  praveen.ii...@gmail.com
   wrote:
  
   Hi,
  
   I have the dfs.block.size value set to 1 GB in my cluster
  configuration.
  
  
   Just out of curiosity - why do you have it set at 1GB?
  
  
   I
   have around 250 GB of data stored in hbase over this cluster. But
 when
  i
   check the number of blocks, it doesn't correspond to the block size
   value i
   set. From what i understand i should only have ~250 blocks. But
 instead
   when i did a fsck on the /hbase/table-name, i got the following
  
   Status: HEALTHY
   Total size:265727504820 B
   Total dirs:1682
   Total files:   1459
   Total blocks (validated):  1459 (avg. block size 182129886 B)
   Minimally replicated blocks:   1459 (100.0 %)
   Over-replicated blocks:0 (0.0 %)
   Under-replicated blocks:   0 (0.0 %)
   Mis-replicated blocks: 0 (0.0 %)
   Default replication factor:3
   Average block replication: 3.0
   Corrupt blocks:0
   Missing replicas:  0 (0.0 %)
   Number of data-nodes:  5
   Number of racks:   1
  
   Are there any other configuration parameters that need to be set ?
  
  
   What is your HFile size set to? The HFiles that get persisted would be
   bound by that number. Thereafter each HFile would be split into
 blocks,
  the
   size of which you configure using the dfs.block.size configuration
   parameter.
  
  
  
   --
   Regards,
   Praveen Bysani
   http://www.praveenbysani.com
  
  
  
   --
   Regards,
   Praveen Bysani
   http://www.praveenbysani.com
 



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com



Re: Block size of HBase files

2013-05-13 Thread Anoop John
I mean when u created the table (Using client I guess)  have u specified
any thuing like splitKeys or [start,end, no#regions]?

-Anoop-

On Mon, May 13, 2013 at 5:49 PM, Praveen Bysani praveen.ii...@gmail.comwrote:

 We insert data using java hbase client (org.apache.hadoop.hbase.client.*) .
 However we are not providing any details in the configuration object ,
 except for the zookeeper quorum, port number. Should we specify explicitly
 at this stage ?

 On 13 May 2013 19:54, Anoop John anoop.hb...@gmail.com wrote:

  now have 731 regions (each about ~350 mb !!). I checked the
  configuration in CM, and the value for hbase.hregion.max.filesize  is 1
 GB
  too !!!
 
  You mentioned the splits at the time of table creation?  How u created
 the
  table?
 
  -Anoop-
 
  On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani praveen.ii...@gmail.com
  wrote:
 
   Hi,
  
   Thanks for the details. No i haven't run any compaction or i have no
 idea
   if there is one going on in background. I executed a major_compact on
  that
   table  and i now have 731 regions (each about ~350 mb !!). I checked
 the
   configuration in CM, and the value for hbase.hregion.max.filesize  is 1
  GB
   too !!!
  
   I am not trying to access HFiles in my MR job, infact i am just using a
  PIG
   script which handles this. This number (731) is close to my number of
 map
   tasks, which makes sense. But how can i decrease this, shouldn't the
 size
   of each region be 1 GB with that configuration value ?
  
  
   On 13 May 2013 18:36, Ted Yu yuzhih...@gmail.com wrote:
  
You can change HFile size through hbase.hregion.max.filesize
 parameter.
   
On May 13, 2013, at 2:45 AM, Praveen Bysani praveen.ii...@gmail.com
 
wrote:
   
 Hi,

 I wanted to minimize on the number of map reduce tasks generated
  while
 processing a job, hence configured it to a larger value.

 I don't think i have configured HFile size in the cluster. I use
   Cloudera
 Manager to mange my cluster, and the only configuration i can
 relate
 to is hfile.block.cache.size
 which is set to 0.25. How do i change the HFile size ?

 On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:

 On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani 
praveen.ii...@gmail.com
 wrote:

 Hi,

 I have the dfs.block.size value set to 1 GB in my cluster
configuration.


 Just out of curiosity - why do you have it set at 1GB?


 I
 have around 250 GB of data stored in hbase over this cluster. But
   when
i
 check the number of blocks, it doesn't correspond to the block
 size
 value i
 set. From what i understand i should only have ~250 blocks. But
   instead
 when i did a fsck on the /hbase/table-name, i got the following

 Status: HEALTHY
 Total size:265727504820 B
 Total dirs:1682
 Total files:   1459
 Total blocks (validated):  1459 (avg. block size 182129886 B)
 Minimally replicated blocks:   1459 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  5
 Number of racks:   1

 Are there any other configuration parameters that need to be set
 ?


 What is your HFile size set to? The HFiles that get persisted
 would
  be
 bound by that number. Thereafter each HFile would be split into
   blocks,
the
 size of which you configure using the dfs.block.size configuration
 parameter.



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com
   
  
  
  
   --
   Regards,
   Praveen Bysani
   http://www.praveenbysani.com
  
 



 --
 Regards,
 Praveen Bysani
 http://www.praveenbysani.com



Re: Export / Import and table splits

2013-05-13 Thread Jean-Marc Spaggiari
Hi Jeremy,

Thanks for sharing this.

I will take a look at it, and also most probably give a try to the snapshot
option

JM

2013/5/7 Jeremy Carroll phobos...@gmail.com


 https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb

 I wrote a quick script to do it with mechanize + ruby. I have a new tool
 which I'm polishing up that does the same thing in Python but using the
 HBase REST interface to get the data.


 On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  Hi,
 
  When we are doing an export, we are only exporting the data. Then when
  we are importing that back, we need to make sure the table is
  pre-splitted correctly else we might hotspot some servers.
 
  If you simply export then import without pre-splitting at all, you
  will most probably brought some servers down because they will be
  overwhelmed with splits and compactions.
 
  Do we have any tool to pre-split a table the same way another table is
  already pre-splitted?
 
  Something like
   duplicate 'source_table', 'target_table'
 
  Which will create a new table called 'target_table' with exactly the
  same parameters as 'source_table' and the same regions boundaries?
 
  If we don't have, will it be useful to have one?
 
  Or event something like:
   create 'target_table', 'f1', {SPLITS_MODEL = 'source_table'}
 
 
  JM
 



Re: Export / Import and table splits

2013-05-13 Thread Matteo Bertozzi
I'll go with the snapshots since you can avoid all the I/O of the
import/export but the consistency model is different, and you don't have
the start/end time option... you should delete the rows  tstart and  tend
after the clone

Matteo



On Tue, May 14, 2013 at 1:48 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Jeremy,

 Thanks for sharing this.

 I will take a look at it, and also most probably give a try to the snapshot
 option

 JM

 2013/5/7 Jeremy Carroll phobos...@gmail.com

 
 
 https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb
 
  I wrote a quick script to do it with mechanize + ruby. I have a new tool
  which I'm polishing up that does the same thing in Python but using the
  HBase REST interface to get the data.
 
 
  On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org
   wrote:
 
   Hi,
  
   When we are doing an export, we are only exporting the data. Then when
   we are importing that back, we need to make sure the table is
   pre-splitted correctly else we might hotspot some servers.
  
   If you simply export then import without pre-splitting at all, you
   will most probably brought some servers down because they will be
   overwhelmed with splits and compactions.
  
   Do we have any tool to pre-split a table the same way another table is
   already pre-splitted?
  
   Something like
duplicate 'source_table', 'target_table'
  
   Which will create a new table called 'target_table' with exactly the
   same parameters as 'source_table' and the same regions boundaries?
  
   If we don't have, will it be useful to have one?
  
   Or event something like:
create 'target_table', 'f1', {SPLITS_MODEL = 'source_table'}
  
  
   JM
  
 



Re: Export / Import and table splits

2013-05-13 Thread Jean-Marc Spaggiari
The cluser is stopped anyway, so there is no consistency concerns. which
mean snapshots might be the best option. No need to delete any after.

The goal is really to export the data locally, get the cluster down, get a
new cluster, put the data and reload the table... the 2 clusters can't be
up at the same time...

2013/5/13 Matteo Bertozzi theo.berto...@gmail.com

 I'll go with the snapshots since you can avoid all the I/O of the
 import/export but the consistency model is different, and you don't have
 the start/end time option... you should delete the rows  tstart and  tend
 after the clone

 Matteo



 On Tue, May 14, 2013 at 1:48 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Jeremy,
 
  Thanks for sharing this.
 
  I will take a look at it, and also most probably give a try to the
 snapshot
  option
 
  JM
 
  2013/5/7 Jeremy Carroll phobos...@gmail.com
 
  
  
 
 https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb
  
   I wrote a quick script to do it with mechanize + ruby. I have a new
 tool
   which I'm polishing up that does the same thing in Python but using the
   HBase REST interface to get the data.
  
  
   On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org
wrote:
  
Hi,
   
When we are doing an export, we are only exporting the data. Then
 when
we are importing that back, we need to make sure the table is
pre-splitted correctly else we might hotspot some servers.
   
If you simply export then import without pre-splitting at all, you
will most probably brought some servers down because they will be
overwhelmed with splits and compactions.
   
Do we have any tool to pre-split a table the same way another table
 is
already pre-splitted?
   
Something like
 duplicate 'source_table', 'target_table'
   
Which will create a new table called 'target_table' with exactly the
same parameters as 'source_table' and the same regions boundaries?
   
If we don't have, will it be useful to have one?
   
Or event something like:
 create 'target_table', 'f1', {SPLITS_MODEL = 'source_table'}
   
   
JM
   
  
 



Re: Block size of HBase files

2013-05-13 Thread Praveen Bysani
Hi Anoop,

No we didn't specify any such while creating and writing into the table.

On 13 May 2013 20:22, Anoop John anoop.hb...@gmail.com wrote:

 I mean when u created the table (Using client I guess)  have u specified
 any thuing like splitKeys or [start,end, no#regions]?

 -Anoop-

 On Mon, May 13, 2013 at 5:49 PM, Praveen Bysani praveen.ii...@gmail.com
 wrote:

  We insert data using java hbase client
 (org.apache.hadoop.hbase.client.*) .
  However we are not providing any details in the configuration object ,
  except for the zookeeper quorum, port number. Should we specify
 explicitly
  at this stage ?
 
  On 13 May 2013 19:54, Anoop John anoop.hb...@gmail.com wrote:
 
   now have 731 regions (each about ~350 mb !!). I checked the
   configuration in CM, and the value for hbase.hregion.max.filesize  is 1
  GB
   too !!!
  
   You mentioned the splits at the time of table creation?  How u created
  the
   table?
  
   -Anoop-
  
   On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani 
 praveen.ii...@gmail.com
   wrote:
  
Hi,
   
Thanks for the details. No i haven't run any compaction or i have no
  idea
if there is one going on in background. I executed a major_compact on
   that
table  and i now have 731 regions (each about ~350 mb !!). I checked
  the
configuration in CM, and the value for hbase.hregion.max.filesize
  is 1
   GB
too !!!
   
I am not trying to access HFiles in my MR job, infact i am just
 using a
   PIG
script which handles this. This number (731) is close to my number of
  map
tasks, which makes sense. But how can i decrease this, shouldn't the
  size
of each region be 1 GB with that configuration value ?
   
   
On 13 May 2013 18:36, Ted Yu yuzhih...@gmail.com wrote:
   
 You can change HFile size through hbase.hregion.max.filesize
  parameter.

 On May 13, 2013, at 2:45 AM, Praveen Bysani 
 praveen.ii...@gmail.com
  
 wrote:

  Hi,
 
  I wanted to minimize on the number of map reduce tasks generated
   while
  processing a job, hence configured it to a larger value.
 
  I don't think i have configured HFile size in the cluster. I use
Cloudera
  Manager to mange my cluster, and the only configuration i can
  relate
  to is hfile.block.cache.size
  which is set to 0.25. How do i change the HFile size ?
 
  On 13 May 2013 15:03, Amandeep Khurana ama...@gmail.com wrote:
 
  On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani 
 praveen.ii...@gmail.com
  wrote:
 
  Hi,
 
  I have the dfs.block.size value set to 1 GB in my cluster
 configuration.
 
 
  Just out of curiosity - why do you have it set at 1GB?
 
 
  I
  have around 250 GB of data stored in hbase over this cluster.
 But
when
 i
  check the number of blocks, it doesn't correspond to the block
  size
  value i
  set. From what i understand i should only have ~250 blocks. But
instead
  when i did a fsck on the /hbase/table-name, i got the
 following
 
  Status: HEALTHY
  Total size:265727504820 B
  Total dirs:1682
  Total files:   1459
  Total blocks (validated):  1459 (avg. block size 182129886
 B)
  Minimally replicated blocks:   1459 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0
  Corrupt blocks:0
  Missing replicas:  0 (0.0 %)
  Number of data-nodes:  5
  Number of racks:   1
 
  Are there any other configuration parameters that need to be
 set
  ?
 
 
  What is your HFile size set to? The HFiles that get persisted
  would
   be
  bound by that number. Thereafter each HFile would be split into
blocks,
 the
  size of which you configure using the dfs.block.size
 configuration
  parameter.
 
 
 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com
 
 
 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com

   
   
   
--
Regards,
Praveen Bysani
http://www.praveenbysani.com
   
  
 
 
 
  --
  Regards,
  Praveen Bysani
  http://www.praveenbysani.com
 




-- 
Regards,
Praveen Bysani
http://www.praveenbysani.com