Re: Is HBase is feasible for storing 4-5 MB of data as cell value
The only other thing I'd add is, by default HBase caps size of the data per column at 10 MB (I think). You can change that by changing this setting: hbase.client.keyvalue.maxsize in hbase-site.xml -1 means no cap. You can put other numbers for appropriate cap for your use case. Ameya On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv dwivedishash...@gmail.com wrote: Yes for sure you can use hbase for this, you can have 1. different fields of mail in different column of a column family and attachment as a binary array also in a column. 2. you can keep whole message in columns in hbase and the attachments are large enoug on the hdfs and some reference to it in hbase table. 3. schema you can decide, you can have a matrix how you store values to that you can decide. *Warm Regards_**∞_* * Shashwat Shriparv* [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9] http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv] https://twitter.com/shriparv[image: https://www.facebook.com/shriparv] https://www.facebook.com/shriparv [image: http://google.com/+ShashwatShriparv] http://google.com/+ShashwatShriparv[image: http://www.youtube.com/user/sShriparv/videos] http://www.youtube.com/user/sShriparv/videos[image: http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] shrip...@yahoo.com On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav upendra1...@gmail.com wrote: I have to use hbase and have mix type of data Some of them have size 1-4K(Mail- Header) and others 5MB(Attachments...) And also we need only random access: any data Is HBase is feasible for storing this type of data What will be my schema design - will have to go with 2 different Table - 1st one for 1-4K and 2nd for big file (because of memstore flush will flush other CF, and huge random access) Or there is other way:; Thanks
Lease Exception Errors When Running Heavy Map Reduce Job
HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: property !-- Loaded from hbase-site.xml -- namehbase.regionserver.lease.period/name value90/value /property property !-- Loaded from hbase-site.xml -- namehbase.rpc.timeout/name value90/value /property We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya
Re: Lease Exception Errors When Running Heavy Map Reduce Job
Thanks for your response. I checked namenode logs and I find following: 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover lease [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25, pendingcreates: 1], src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413 from client DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25, pendingcreates: 1], src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. There are LeaseException errors on namenode as well: http://pastebin.com/4feVcL1F Not sure why its happening. I do not think I am ending up with any timeouts, as my jobs fail within couple of minutes, while all my time outs are 10 minutes+ Not sure why above would Ameya On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote: From the log you posted on pastebin, I see the following. Can you check namenode log to see what went wrong ? 1. Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514 File does not exist. [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25, pendingcreates: 1] On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com wrote: HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: property !-- Loaded from hbase-site.xml -- namehbase.regionserver.lease.period/name value90/value /property property !-- Loaded from hbase-site.xml -- namehbase.rpc.timeout/name value90/value /property We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya
Re: Lease Exception Errors When Running Heavy Map Reduce Job
Any ideas? Anyone? On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar am...@groupon.com wrote: Thanks for your response. I checked namenode logs and I find following: 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover lease [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25, pendingcreates: 1], src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413 from client DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25, pendingcreates: 1], src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. There are LeaseException errors on namenode as well: http://pastebin.com/4feVcL1F Not sure why its happening. I do not think I am ending up with any timeouts, as my jobs fail within couple of minutes, while all my time outs are 10 minutes+ Not sure why above would Ameya On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote: From the log you posted on pastebin, I see the following. Can you check namenode log to see what went wrong ? 1. Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514 File does not exist. [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25, pendingcreates: 1] On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com wrote: HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: property !-- Loaded from hbase-site.xml -- namehbase.regionserver.lease.period/name value90/value /property property !-- Loaded from hbase-site.xml -- namehbase.rpc.timeout/name value90/value /property We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya