Re: Is HBase is feasible for storing 4-5 MB of data as cell value

2014-02-25 Thread Ameya Kanitkar
The only other thing I'd add is, by default HBase caps size of the data per
column at 10 MB (I think). You can change that by changing this setting:

hbase.client.keyvalue.maxsize
in hbase-site.xml

-1 means no cap. You can put other numbers for appropriate cap for your use
case.

Ameya


On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv 
dwivedishash...@gmail.com wrote:

 Yes for sure you can use hbase for this, you can have
 1. different fields of mail in different column of a column family and
 attachment as a binary array also in a column.
 2. you can keep whole message in columns in hbase and the attachments are
 large enoug on the hdfs and some reference to it in hbase table.
 3. schema you can decide, you can have a matrix how you store values to
 that you can decide.


 *Warm Regards_**∞_*
 * Shashwat Shriparv*
  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
 http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image:
 https://twitter.com/shriparv] https://twitter.com/shriparv[image:
 https://www.facebook.com/shriparv] https://www.facebook.com/shriparv
 [image:
 http://google.com/+ShashwatShriparv]
 http://google.com/+ShashwatShriparv[image:
 http://www.youtube.com/user/sShriparv/videos]
 http://www.youtube.com/user/sShriparv/videos[image:
 http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] shrip...@yahoo.com



 On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav upendra1...@gmail.com
 wrote:

  I have to use hbase and have mix type of data
 
  Some of them have size 1-4K(Mail- Header) and others
  5MB(Attachments...)
 
  And also we need only random access: any data
 
  Is HBase is feasible for storing this type of data
 
  What will be my schema design -
  will have to go with 2 different Table - 1st one for  1-4K and 2nd for
 big
  file
  (because of memstore flush will flush other CF, and huge random access)
 
  Or there is other way:;
 
  Thanks
 



Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:

property
!-- Loaded from hbase-site.xml --
namehbase.regionserver.lease.period/name
value90/value
/property

property
!-- Loaded from hbase-site.xml --
namehbase.rpc.timeout/name
value90/value
/property


We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Thanks for your response.

I checked namenode logs and I find following:

2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
lease [Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
from client
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
closed.

There are LeaseException errors on namenode as well:
http://pastebin.com/4feVcL1F Not sure why its happening.

I do not think I am ending up with any timeouts, as my jobs fail within
couple of minutes, while all my time outs are 10 minutes+
Not sure why above would

Ameya



On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote:

 From the log you posted on pastebin, I see the following.
 Can you check namenode log to see what went wrong ?


1. Caused by:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
 on

  
 /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
File does not exist. [Lease.  Holder:

  
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
pendingcreates: 1]



 On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com wrote:

  HI All,
 
  We have a very heavy map reduce job that goes over entire table with over
  1TB+ data in HBase and exports all data (Similar to Export job but with
  some additional custom code built in) to HDFS.
 
  However this job is not very stable, and often times we get following
 error
  and job fails:
 
  org.apache.hadoop.hbase.regionserver.LeaseException:
  org.apache.hadoop.hbase.regionserver.LeaseException: lease
  '-4456594242606811626' does not exist
  at
  org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
  at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.
 
 
  Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
 
  We have changed following settings in HBase to counter this problem
  but issue persists:
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.regionserver.lease.period/name
  value90/value
  /property
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.rpc.timeout/name
  value90/value
  /property
 
 
  We also reduced number of mappers per RS less than available CPU's on the
  box.
 
  We also observed that problem once happens, happens multiple times on
  the same RS. All other regions are unaffected. But different RS
  observes this problem on different days. There is no particular region
  causing this either.
 
  We are running: 0.94.2 with cdh4.2.0
 
  Any ideas?
 
 
  Ameya
 



Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Any ideas? Anyone?


On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar am...@groupon.com wrote:

 Thanks for your response.

 I checked namenode logs and I find following:

 2013-08-28 15:25:24,025 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
 lease [Lease.  Holder:
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
 pendingcreates: 1],
 src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
 from client
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
 2013-08-28 15:25:24,025 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
 lease=[Lease.  Holder:
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
 pendingcreates: 1],
 src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
 closed.

 There are LeaseException errors on namenode as well:
 http://pastebin.com/4feVcL1F Not sure why its happening.

 I do not think I am ending up with any timeouts, as my jobs fail within
 couple of minutes, while all my time outs are 10 minutes+
 Not sure why above would

 Ameya



 On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote:

 From the log you posted on pastebin, I see the following.
 Can you check namenode log to see what went wrong ?


1. Caused by:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
 on

  
 /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
File does not exist. [Lease.  Holder:

  
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
pendingcreates: 1]



 On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com
 wrote:

  HI All,
 
  We have a very heavy map reduce job that goes over entire table with
 over
  1TB+ data in HBase and exports all data (Similar to Export job but with
  some additional custom code built in) to HDFS.
 
  However this job is not very stable, and often times we get following
 error
  and job fails:
 
  org.apache.hadoop.hbase.regionserver.LeaseException:
  org.apache.hadoop.hbase.regionserver.LeaseException: lease
  '-4456594242606811626' does not exist
  at
  org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
  at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.
 
 
  Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
 
  We have changed following settings in HBase to counter this problem
  but issue persists:
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.regionserver.lease.period/name
  value90/value
  /property
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.rpc.timeout/name
  value90/value
  /property
 
 
  We also reduced number of mappers per RS less than available CPU's on
 the
  box.
 
  We also observed that problem once happens, happens multiple times on
  the same RS. All other regions are unaffected. But different RS
  observes this problem on different days. There is no particular region
  causing this either.
 
  We are running: 0.94.2 with cdh4.2.0
 
  Any ideas?
 
 
  Ameya