Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Eugeny Morozov
Finally, the mystery has been solved.

Small remark before I explain everything.

The situation with only region is absolutely the same:
Fzzy: 1Q7iQ9JA
Next fzzy: F7dtxwqVQ_Pw  -- the value I'm trying to find.
Fzzy: F7dt8QWPSIDw
Somehow FuzzyRowFilter has just omit my value here.


So, the explanation.
In javadoc for FuzzyRowFilter question mark is used as substitution for
unknown value. Of course it's possible to use anything including zero
instead of question mark.
For quite some time we used literals to encode our keys. Literals like
you've seen already: 1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
of just 8 bytes, which requires 1.5 times more space. So we've decided to
store raw version - just  byte[8]. But unfortunately the symbol '?' is
exactly in the middle of the byte (according to ascii table
http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
of values in some cases. In the same time question mark is exactly before
any letter that could be used in key.

Despite the fact we have integration tests - that's just a coincidence we
haven't such an example in there.

So, as an advice - always use zero instead of question mark for
FuzzyRowFilter.

Thank's to everyone!

P.S. But the question with region scanning order is still here. I do not
understand why with FuzzyFilter it goes from one region to another until it
stops at the value. I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.


On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel michael_se...@hotmail.comwrote:

 If its the same class and its not a patch, then the first class loaded
 wins.

 So if you have a Class Foo and HBase has a Class Foo, your code will never
 see the light of day.

 Perhaps I'm stating the obvious but its something to think about when
 working w Hadoop.

 On Jan 19, 2013, at 3:36 AM, Eugeny Morozov emoro...@griddynamics.com
 wrote:

  Ted,
 
  that is correct.
  HBase 0.92.x and we use part of the patch 6509.
 
  I use the filter as a custom filter, it lives in separate jar file and
 goes
  to HBase's classpath. I did not patch HBase.
  Moreover I do not use protobuf's descriptions that comes with the filter
 in
  patch. Only two classes I have - FuzzyRowFilter itself and its test
 class.
 
  And it works perfectly on small dataset like 100 rows (1 region). But
 when
  my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
  not sure, but it seems to me it is not fault of the filter.
 
 
  On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  To my knowledge CDH-4.1.2 is based on HBase 0.92.x
 
  Looks like you were using patch from HBASE-6509 which was integrated to
  trunk only.
  Please confirm.
 
  Copying Alex who wrote the patch.
 
  Cheers
 
  On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
  emoro...@griddynamics.comwrote:
 
  Hi, folks!
 
  HBase, Hadoop, etc version is CDH-4.1.2
 
  I'm using custom FuzzyRowFilter, which I get from
 
 
 
 http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
  suddenly after quite a time we found that it starts loosing data.
 
  Basically the idea of FuzzyRowFilter is that it tries to find key that
  has
  been provided and if there is no such a key - but more exists in table
 -
  it
  returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
  required
  key. As I understand, HBase in this key will fast-forward to required
  key -
  it must be similar or same as to get Scan with setStartRow.
 
  I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
 able
  to get it using Scan.setStartRow.
  For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
  row
  or anything related.
  That's what happening:
 
  Fzzy: 1Q7iQ9JA
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: AQAAnA96rxTg
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: AgAADQWPSIDw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: AwAA-Q33Zb9Q
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: BAAAOg8oyu7A
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: BQAA9gqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: BgABZQ7iQ9JA
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: BwAAbgrpAojg
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: CAAAUQWPSIDw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: CQABVgqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: CgAAOQ7iQ9JA
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: CwAALwqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: DAAAMwWPSIDw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: DQAADgjqzsIQ
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: DgAAOgCcWv9g
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: DwAAKg7iQ9JA
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: EAAAugqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: EQAAJAqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: EgAABgIOMBgg
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: EwAAEwqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: FAAACQqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: FQAAIAqVQrTw
  Next fzzy: F7dtxwqVQ_Pw
  Fzzy: FgAAeAWPSIDw
  

Re: confused about Data/Disk ratio

2013-01-21 Thread varun kumar
Hi Tian,

What is replication factor you mention in hdfs.

Regards,
Varun Kumar.P

On Mon, Jan 21, 2013 at 12:17 PM, tgh guanhua.t...@ia.ac.cn wrote:

 Hi
 I use hbase to store Data, and I have an observation, that is,
 When hbase store 1Gb data, hdfs use 10Gb disk space, and when data
 is 60Gb, hdfs use 180Gb disk, and when data is about 2Tb, hdfs use 3Tb
 disk,

 That is, the ratio of data/disk is not a linear one, and why,

 Could you help me


 Thank you
 -
 Guanhua Tian







-- 
Regards,
Varun Kumar.P


Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread ramkrishna vasudevan
On Mon, Jan 21, 2013 at 1:46 PM, Eugeny Morozov
emoro...@griddynamics.comwrote:

 I do not
 understand why with FuzzyFilter it goes from one region to another until it
 stops at the value. I suppose if scanning process has started at once on
 all regions


Scanning process does not start parallely on all regions.  Once a start row
is specified with the scan, the corresponding region server is picked up
and on that region server,
the scan starts from that region which holds the start row and the scan
proceeds till it finds the stop row. The stop row can be any of the regions
in the same region server, in the exact increasing byte order.

Regards
Ram


RE: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Anoop Sam John
 I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.

@Eugeny -  FuzzyFilter like any other filter works at the server side. The 
scanning from client side will be like sequential starting from the 1st region 
(Region with empty startkey or the corresponding region which contains the 
startkey whatever you mentioned in your scan). From client, request will go to 
RS for scanning a region. Once that region is over the next region will be 
contacted for scan(from client) and so on.  There is no parallel scanning of 
multiple regions from client side.  [This is when using a HTable scan APIs]

When MR used for scanning, we will be doing parallel scans from all the 
regions. Here will be having mappers per region.  But the normal scan from 
client side will be sequential on the regions not parallel.

-Anoop-

From: Eugeny Morozov [emoro...@griddynamics.com]
Sent: Monday, January 21, 2013 1:46 PM
To: user@hbase.apache.org
Cc: Alex Baranau
Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Finally, the mystery has been solved.

Small remark before I explain everything.

The situation with only region is absolutely the same:
Fzzy: 1Q7iQ9JA
Next fzzy: F7dtxwqVQ_Pw  -- the value I'm trying to find.
Fzzy: F7dt8QWPSIDw
Somehow FuzzyRowFilter has just omit my value here.


So, the explanation.
In javadoc for FuzzyRowFilter question mark is used as substitution for
unknown value. Of course it's possible to use anything including zero
instead of question mark.
For quite some time we used literals to encode our keys. Literals like
you've seen already: 1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
of just 8 bytes, which requires 1.5 times more space. So we've decided to
store raw version - just  byte[8]. But unfortunately the symbol '?' is
exactly in the middle of the byte (according to ascii table
http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
of values in some cases. In the same time question mark is exactly before
any letter that could be used in key.

Despite the fact we have integration tests - that's just a coincidence we
haven't such an example in there.

So, as an advice - always use zero instead of question mark for
FuzzyRowFilter.

Thank's to everyone!

P.S. But the question with region scanning order is still here. I do not
understand why with FuzzyFilter it goes from one region to another until it
stops at the value. I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.


On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel michael_se...@hotmail.comwrote:

 If its the same class and its not a patch, then the first class loaded
 wins.

 So if you have a Class Foo and HBase has a Class Foo, your code will never
 see the light of day.

 Perhaps I'm stating the obvious but its something to think about when
 working w Hadoop.

 On Jan 19, 2013, at 3:36 AM, Eugeny Morozov emoro...@griddynamics.com
 wrote:

  Ted,
 
  that is correct.
  HBase 0.92.x and we use part of the patch 6509.
 
  I use the filter as a custom filter, it lives in separate jar file and
 goes
  to HBase's classpath. I did not patch HBase.
  Moreover I do not use protobuf's descriptions that comes with the filter
 in
  patch. Only two classes I have - FuzzyRowFilter itself and its test
 class.
 
  And it works perfectly on small dataset like 100 rows (1 region). But
 when
  my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
  not sure, but it seems to me it is not fault of the filter.
 
 
  On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  To my knowledge CDH-4.1.2 is based on HBase 0.92.x
 
  Looks like you were using patch from HBASE-6509 which was integrated to
  trunk only.
  Please confirm.
 
  Copying Alex who wrote the patch.
 
  Cheers
 
  On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
  emoro...@griddynamics.comwrote:
 
  Hi, folks!
 
  HBase, Hadoop, etc version is CDH-4.1.2
 
  I'm using custom FuzzyRowFilter, which I get from
 
 
 
 http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
  suddenly after quite a time we found that it starts loosing data.
 
  Basically the idea of FuzzyRowFilter is that it tries to find key that
  has
  been provided and if there is no such a key - but more exists in table
 -
  it
  returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
  required
  key. As I understand, HBase in this key will fast-forward to required
  key -
  it must be similar or same as to get Scan with setStartRow.
 
  I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
 able
 

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Eugeny Morozov
Anoop, Ramkrishna

Thank you for explanation! I've got it.


On Mon, Jan 21, 2013 at 12:59 PM, Anoop Sam John anoo...@huawei.com wrote:

  I suppose if scanning process has started at once on
 all regions, then I would find in log files at least one value per region,
 but I have found one value per region only for those regions, that resides
 before the particular one.

 @Eugeny -  FuzzyFilter like any other filter works at the server side. The
 scanning from client side will be like sequential starting from the 1st
 region (Region with empty startkey or the corresponding region which
 contains the startkey whatever you mentioned in your scan). From client,
 request will go to RS for scanning a region. Once that region is over the
 next region will be contacted for scan(from client) and so on.  There is no
 parallel scanning of multiple regions from client side.  [This is when
 using a HTable scan APIs]

 When MR used for scanning, we will be doing parallel scans from all the
 regions. Here will be having mappers per region.  But the normal scan from
 client side will be sequential on the regions not parallel.

 -Anoop-
 
 From: Eugeny Morozov [emoro...@griddynamics.com]
 Sent: Monday, January 21, 2013 1:46 PM
 To: user@hbase.apache.org
 Cc: Alex Baranau
 Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue

 Finally, the mystery has been solved.

 Small remark before I explain everything.

 The situation with only region is absolutely the same:
 Fzzy: 1Q7iQ9JA
 Next fzzy: F7dtxwqVQ_Pw  -- the value I'm trying to find.
 Fzzy: F7dt8QWPSIDw
 Somehow FuzzyRowFilter has just omit my value here.


 So, the explanation.
 In javadoc for FuzzyRowFilter question mark is used as substitution for
 unknown value. Of course it's possible to use anything including zero
 instead of question mark.
 For quite some time we used literals to encode our keys. Literals like
 you've seen already: 1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
 of just 8 bytes, which requires 1.5 times more space. So we've decided to
 store raw version - just  byte[8]. But unfortunately the symbol '?' is
 exactly in the middle of the byte (according to ascii table
 http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
 of values in some cases. In the same time question mark is exactly before
 any letter that could be used in key.

 Despite the fact we have integration tests - that's just a coincidence we
 haven't such an example in there.

 So, as an advice - always use zero instead of question mark for
 FuzzyRowFilter.

 Thank's to everyone!

 P.S. But the question with region scanning order is still here. I do not
 understand why with FuzzyFilter it goes from one region to another until it
 stops at the value. I suppose if scanning process has started at once on
 all regions, then I would find in log files at least one value per region,
 but I have found one value per region only for those regions, that resides
 before the particular one.


 On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel michael_se...@hotmail.com
 wrote:

  If its the same class and its not a patch, then the first class loaded
  wins.
 
  So if you have a Class Foo and HBase has a Class Foo, your code will
 never
  see the light of day.
 
  Perhaps I'm stating the obvious but its something to think about when
  working w Hadoop.
 
  On Jan 19, 2013, at 3:36 AM, Eugeny Morozov emoro...@griddynamics.com
  wrote:
 
   Ted,
  
   that is correct.
   HBase 0.92.x and we use part of the patch 6509.
  
   I use the filter as a custom filter, it lives in separate jar file and
  goes
   to HBase's classpath. I did not patch HBase.
   Moreover I do not use protobuf's descriptions that comes with the
 filter
  in
   patch. Only two classes I have - FuzzyRowFilter itself and its test
  class.
  
   And it works perfectly on small dataset like 100 rows (1 region). But
  when
   my dataset is more than 10mln (260 regions), it somehow loosing rows.
 I'm
   not sure, but it seems to me it is not fault of the filter.
  
  
   On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu yuzhih...@gmail.com wrote:
  
   To my knowledge CDH-4.1.2 is based on HBase 0.92.x
  
   Looks like you were using patch from HBASE-6509 which was integrated
 to
   trunk only.
   Please confirm.
  
   Copying Alex who wrote the patch.
  
   Cheers
  
   On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
   emoro...@griddynamics.comwrote:
  
   Hi, folks!
  
   HBase, Hadoop, etc version is CDH-4.1.2
  
   I'm using custom FuzzyRowFilter, which I get from
  
  
  
 
 http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
   suddenly after quite a time we found that it starts loosing data.
  
   Basically the idea of FuzzyRowFilter is that it tries to find key
 that
   has
   been provided and if there is no such a key - but more exists in
 table
  -
   it
   returns SEEK_NEXT_USING_HINT. And in 

Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-21 Thread Farrokh Shahriari
Tnx,But I don't know why when the client.buffer.size is increased, I've got
bad result,does it related to other parameters ? and I give 8 gb heap to
each regionserver.

On Mon, Jan 21, 2013 at 12:34 PM, Harsh J ha...@cloudera.com wrote:

 Hi Farrokh,

 This isn't a HDFS question - please ask these questions only on their
 relevant lists for best results and to keep each list's discussion separate.


 On Mon, Jan 21, 2013 at 11:40 AM, Farrokh Shahriari 
 mohandes.zebeleh...@gmail.com wrote:

 Hi there
 Is there any way to use arrayList of Puts in map function to insert data
 to hbase ? Because,the context.write method doesn't allow to use arraylist
 of puts,so in every map function I can only put one row. What can I do for
 inserting some rows in each map function ?
 And also how can I use autoflush  bufferclientside in Map function for
 inserting data to Hbase Table ?

 Mohandes Zebeleh




 --
 Harsh J



Re: HBase 0.94 shell throwing a NoSuchMethodError: hbase.util.Threads.sleep(I)V from ZK code

2013-01-21 Thread Jean-Marc Spaggiari
This error is strange.

The sleep method is there  in Threads for a long time now. Ok it was
(int millis) before, and it's (long millis) now but should not do such
a difference.

tsuna, how is your setup configured? Do you run KZ locally? Or
standalone? What jars do you have for HBase and ZK?

It seems your ZooKeeperWatcher is loading a version of Threads which
doesn't have the sleep methode. Sleep method appears in the class in
April 2010... Do you have an old installation of the application
somewhere?

JM

2013/1/21, lars hofhansl la...@apache.org:
 I suspect this is a different problem. Java will happily cast an int to a
 long where needed.
 Does  mvn clean install  fix this? If not, let's file a jira.

 -- Lars



 
  From: Ted Yu yuzhih...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, January 20, 2013 9:30 PM
 Subject: Re: HBase 0.94 shell throwing a NoSuchMethodError:
 hbase.util.Threads.sleep(I)V from ZK code

 Thanks for reporting this, Benoit.
 Here is the call:

           Threads.sleep(1);
 Here is the method to be called:

   public static void sleep(long millis) {

 Notice the mismatch in argument types: 1 being integer and millis being
 long.

 Cheers

 On Sun, Jan 20, 2013 at 9:01 PM, tsuna tsuna...@gmail.com wrote:

 I just updated my local tree (branch 0.94, SVN r1435317) and I see
 these spurious exceptions in the HBase shell:

 $ COMPRESSION=LZO HBASE_HOME=~/src/hbase ./src/create_table.sh
 HBase Shell; enter 'helpRETURN' for list of supported commands.
 Type exitRETURN to leave the HBase Shell
 Version 0.94.4, r6034258c5573cc0185c8e979b5599f73662374ed, Sat Jan 19
 01:05:04 PST 2013

 create 'tsdb-uid',
   {NAME = 'id', COMPRESSION = 'LZO'},
   {NAME = 'name', COMPRESSION = 'LZO'}
 2013-01-20 20:56:20.116 java[41854:1203] Unable to load realm info
 from SCDynamicStore
 13/01/20 20:56:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.NoSuchMethodError:
 org.apache.hadoop.hbase.util.Threads.sleep(I)V
         at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:342)
         at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
         at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
         at
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 0 row(s) in 6.5040 seconds


 create 'tsdb',
   {NAME = 't', VERSIONS = 1, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}
 0 row(s) in 1.0830 seconds

 I don't have any local changes.  Anyone else seeing this?  It doesn't
 seem to impact functionality (i.e. my table was created properly).

 --
 Benoit tsuna Sigoure



Errors for Hive Hbase org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation

2013-01-21 Thread Kotadiya,Kalpesh
Hi,

We have a setup of HIVE on Sanbox environment the queries works fine on that 
and there are no errors, We have the same setup on Production we are getting 
the following error from Hbase.

This  errors are showing over and over again. Any idea why this error might be 
occurring. Since on one environment it works fine and in other it show this 
error.

Error::

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.NullPointerException
at 
com.cerner.kepler.hive.KeplerCompositeKey.getField(KeplerCompositeKey.java:107)
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:218)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:219)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
 ]
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@17a697a1
 closed
at 
com.cerner.kepler.util.hbase.InstrumentedHTableFactory.createHTableInterface(InstrumentedHTableFactory.java:197)
at 
com.cerner.kepler.util.hbase.HTablePool.createHTable(HTablePool.java:269)
at 
com.cerner.kepler.util.hbase.HTablePool.findOrCreateTable(HTablePool.java:200)
at com.cerner.kepler.util.hbase.HTablePool.getTable(HTablePool.java:175)
at com.cerner.kepler.util.hbase.HTablePool.getTable(HTablePool.java:218)
at 
com.cerner.kepler.entity.hbase.HBaseConnectionManager.getMetaTable(HBaseConnectionManager.java:81)
at 
com.cerner.kepler.entity.hbase.EntityTypeStore.getTypeFromId(EntityTypeStore.java:562)
at 
com.cerner.kepler.entity.hbase.EntityTypeStore.toEntityKey(EntityTypeStore.java:511)
at 
com.cerner.kepler.entity.hbase.HBaseKeyEncoder.toKey(HBaseKeyEncoder.java:45)
at 
com.cerner.kepler.hive.KeplerCompositeKey.init(KeplerCompositeKey.java:92)
at 
org.apache.hadoop.hive.hbase.LazyHBaseRow.uncheckedGetField(LazyHBaseRow.java:190)
at 
org.apache.hadoop.hive.hbase.LazyHBaseRow.getField(LazyHBaseRow.java:134)
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:218)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.evaluate(ExprNodeFieldEvaluator.java:80)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:64)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:40)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:118)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
... 9 more
Caused by: java.io.IOException: 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@17a697a1
 closed
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:794)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:782)
at 

What does Found lingering reference file mean?

2013-01-21 Thread Jean-Marc Spaggiari
I have issue below when I'm runing hbck:

ERROR: Found lingering reference file
hdfs://node3:9000/hbase/entry_proposed/fbd1735591467005e53f48645278b006/recovered.edits/00091843039.temp

and I'm wondering what it means...

Thanks,

JM


[ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread Jonathan Hsieh
On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
and Nicholas Liochon as members of the Apache HBase PMC.

* Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
efforts, several hbck repairs, and the current  revamp of the
assignment manager.
* Nicolas (nkeywal) has been a one of the drivers of unit testing
categorization and work improving the mean time to recovery in the
face of partial failure.

Please join me in congratulating Jimmy and Nicholas on their new roles!

Jon.

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// j...@cloudera.com


Re: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread Patrick Angeles
Congratz Jimmy and Nicholas... well deserved for both of you.


On Mon, Jan 21, 2013 at 3:56 PM, Jonathan Hsieh j...@cloudera.com wrote:

 On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
 and Nicholas Liochon as members of the Apache HBase PMC.

 * Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
 efforts, several hbck repairs, and the current  revamp of the
 assignment manager.
 * Nicolas (nkeywal) has been a one of the drivers of unit testing
 categorization and work improving the mean time to recovery in the
 face of partial failure.

 Please join me in congratulating Jimmy and Nicholas on their new roles!

 Jon.

 --
 // Jonathan Hsieh (shay)
 // Software Engineer, Cloudera
 // j...@cloudera.com



Re: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread Kevin O'dell
Awesome work!

On Mon, Jan 21, 2013 at 3:59 PM, Patrick Angeles
patrickange...@gmail.comwrote:

 Congratz Jimmy and Nicholas... well deserved for both of you.


 On Mon, Jan 21, 2013 at 3:56 PM, Jonathan Hsieh j...@cloudera.com wrote:

  On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
  and Nicholas Liochon as members of the Apache HBase PMC.
 
  * Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
  efforts, several hbck repairs, and the current  revamp of the
  assignment manager.
  * Nicolas (nkeywal) has been a one of the drivers of unit testing
  categorization and work improving the mean time to recovery in the
  face of partial failure.
 
  Please join me in congratulating Jimmy and Nicholas on their new roles!
 
  Jon.
 
  --
  // Jonathan Hsieh (shay)
  // Software Engineer, Cloudera
  // j...@cloudera.com
 




-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera


Re: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread Jesse Yates
Congrats fellas - great work!

- Jesse Yates

On Jan 21, 2013, at 12:56 PM, Jonathan Hsieh j...@cloudera.com wrote:

 On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
 and Nicholas Liochon as members of the Apache HBase PMC.
 
 * Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
 efforts, several hbck repairs, and the current  revamp of the
 assignment manager.
 * Nicolas (nkeywal) has been a one of the drivers of unit testing
 categorization and work improving the mean time to recovery in the
 face of partial failure.
 
 Please join me in congratulating Jimmy and Nicholas on their new roles!
 
 Jon.
 
 -- 
 // Jonathan Hsieh (shay)
 // Software Engineer, Cloudera
 // j...@cloudera.com


Re: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread Stack
Good on you lads!
St.Ack


On Mon, Jan 21, 2013 at 12:56 PM, Jonathan Hsieh j...@cloudera.com wrote:

 On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
 and Nicholas Liochon as members of the Apache HBase PMC.

 * Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
 efforts, several hbck repairs, and the current  revamp of the
 assignment manager.
 * Nicolas (nkeywal) has been a one of the drivers of unit testing
 categorization and work improving the mean time to recovery in the
 face of partial failure.

 Please join me in congratulating Jimmy and Nicholas on their new roles!

 Jon.

 --
 // Jonathan Hsieh (shay)
 // Software Engineer, Cloudera
 // j...@cloudera.com



Re: What does Found lingering reference file mean?

2013-01-21 Thread Stack
On Mon, Jan 21, 2013 at 12:01 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Found lingering reference file



The comment on the method that is finding the lingering reference files is
pretty good:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#604

It looks like a reference file that lost its referencee.

If you pass this arg., does it help?

http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#3391


St.Ack


Re: What does Found lingering reference file mean?

2013-01-21 Thread Jean-Marc Spaggiari
Hum. It's still a bit obscur for me how this happend to my cluster...

-repair helped to fix that, so I'm now fine. I will re-run the job I
ran and see if this is happening again.

Thanks,

JM

2013/1/21, Stack st...@duboce.net:
 On Mon, Jan 21, 2013 at 12:01 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Found lingering reference file



 The comment on the method that is finding the lingering reference files is
 pretty good:
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#604

 It looks like a reference file that lost its referencee.

 If you pass this arg., does it help?

 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#3391


 St.Ack



Re: What does Found lingering reference file mean?

2013-01-21 Thread Stack
Did you get the name of the broken reference?  I'd trace its life in
namenode logs and in regionserver log by searching its name (You might have
to find the region in master logs to see where region landed over time).
The reference name includes the encoded region name as a suffix.  This is
the region that the reference 'references' so need to figure what
happened with it.  Did it get cleaned up before reference was cleared?
 (Something that should not happen).

St.Ack


On Mon, Jan 21, 2013 at 2:20 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hum. It's still a bit obscur for me how this happend to my cluster...

 -repair helped to fix that, so I'm now fine. I will re-run the job I
 ran and see if this is happening again.

 Thanks,

 JM

 2013/1/21, Stack st...@duboce.net:
  On Mon, Jan 21, 2013 at 12:01 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Found lingering reference file
 
 
 
  The comment on the method that is finding the lingering reference files
 is
  pretty good:
 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#604
 
  It looks like a reference file that lost its referencee.
 
  If you pass this arg., does it help?
 
 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#3391
 
 
  St.Ack
 



Re: What does Found lingering reference file mean?

2013-01-21 Thread Jimmy Xiang
RECOVERED_EDITS is not a column family.  It should be ignored by hbck.

Filed a jira:

https://issues.apache.org/jira/browse/HBASE-7640

Thanks,
Jimmy

On Mon, Jan 21, 2013 at 2:36 PM, Stack st...@duboce.net wrote:

 Did you get the name of the broken reference?  I'd trace its life in
 namenode logs and in regionserver log by searching its name (You might have
 to find the region in master logs to see where region landed over time).
 The reference name includes the encoded region name as a suffix.  This is
 the region that the reference 'references' so need to figure what
 happened with it.  Did it get cleaned up before reference was cleared?
  (Something that should not happen).

 St.Ack


 On Mon, Jan 21, 2013 at 2:20 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hum. It's still a bit obscur for me how this happend to my cluster...
 
  -repair helped to fix that, so I'm now fine. I will re-run the job I
  ran and see if this is happening again.
 
  Thanks,
 
  JM
 
  2013/1/21, Stack st...@duboce.net:
   On Mon, Jan 21, 2013 at 12:01 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
   Found lingering reference file
  
  
  
   The comment on the method that is finding the lingering reference files
  is
   pretty good:
  
 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#604
  
   It looks like a reference file that lost its referencee.
  
   If you pass this arg., does it help?
  
  
 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/HBaseFsck.html#3391
  
  
   St.Ack
  
 



Re: What does Found lingering reference file mean?

2013-01-21 Thread Stack
On Mon, Jan 21, 2013 at 2:45 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 RECOVERED_EDITS is not a column family.  It should be ignored by hbck.

 Filed a jira:

 https://issues.apache.org/jira/browse/HBASE-764https://issues.apache.org/jira/browse/HBASE-7640


Thanks Jimmy.  That makes sense now you mention it (smile).
St.Ack


Re: What does Found lingering reference file mean?

2013-01-21 Thread Jean-Marc Spaggiari
Ok, so basically, there was no issues with my tables? I did not used
any specific keywors for my CF... They are all called @ or A ;)

2013/1/21, Stack st...@duboce.net:
 On Mon, Jan 21, 2013 at 2:45 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 RECOVERED_EDITS is not a column family.  It should be ignored by hbck.

 Filed a jira:

 https://issues.apache.org/jira/browse/HBASE-764https://issues.apache.org/jira/browse/HBASE-7640


 Thanks Jimmy.  That makes sense now you mention it (smile).
 St.Ack



转发: confused about Data/Disk ratio

2013-01-21 Thread tgh
Thank you for your reply

 

I set the factor =1 , that is ,no replication there , I use it for research
, 

 

And I get an  observation , that is,

When you store a small number of data into hbase , hbase will use a huge
disk space, i.e., when hbase store 3million messages, which use 1GB disk as
text in linuxFS, it will use 10GB disk in hbase,

While  when you continue adding more data into hbase, hbase will use more
disk , but with less addition, i.e., when hbase continue to store 200million
message, which use 60GB disk as text in linuxFS , it will use 180GB disk in
hbase,

And when you continue these addion procession, i.e., when hbase store 6
billion message , which use 2TB disk as text in linux FS , it will use 3TB
disk in hbase,

 

Do I make it clear, 

 

And I want to know why hbase use 10GB when only 3million messages, 

and why the usage of disk does not grow with linear ,

that is , it does not grow to 600GB when hbase store 200 million messages,

and it does not grow to 36TB when 6 billion message in hbase,

 

I know it is a good feature for hbase to store big data, 

I want to know why,

 

 

Could you help me

 

Thank you

-

Guanhua Tian

 

 

 

 

 

 

发件人: varun kumar [mailto:varun@gmail.com] 
发送时间: 2013年1月21日 16:56
收件人: guanhua.t...@ia.ac.cn
抄送: user@hbase.apache.org
主题: Re: confused about Data/Disk ratio

 

Hi Tian,

 

What is replication factor you mention in hdfs.

 

Regards,

Varun Kumar.P

 

On Mon, Jan 21, 2013 at 12:17 PM, tgh guanhua.t...@ia.ac.cn wrote:

Hi
I use hbase to store Data, and I have an observation, that is,
When hbase store 1Gb data, hdfs use 10Gb disk space, and when data
is 60Gb, hdfs use 180Gb disk, and when data is about 2Tb, hdfs use 3Tb disk,

That is, the ratio of data/disk is not a linear one, and why,

Could you help me


Thank you
-
Guanhua Tian








 

-- 

Regards,

Varun Kumar.P



Re: Storing images in Hbase

2013-01-21 Thread Varun Sharma
Thanks for the useful information. I wonder why you use only 5G heap when
you have an 8G machine ? Is there a reason to not use all of it (the
DataNode typically takes a 1G of RAM)

On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin magn...@gmail.com wrote:

 I forgot to mention that I also have this setup:

 property
   namehbase.hregion.memstore.flush.size/name
   value33554432/value
   descriptionFlush more often. Default: 67108864/description
 /property

 This parameter works on per region amount, so this means if any of my
 400 (currently) regions on a regionserver has 30MB+ in memstore, the
 hbase will flush it to disk.


 Here are some metrics from a regionserver:

 requests=2, regions=370, stores=370, storefiles=1390,
 storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
 flushQueueSize=0, usedHeap=3516, maxHeap=4987,
 blockCacheSize=790656256, blockCacheFree=255245888,
 blockCacheCount=2436, blockCacheHitCount=218015828,
 blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
 blockCacheHitRatio=94, blockCacheHitCachingRatio=98

 Note, that memstore is only 2G, this particular regionserver HEAP is set
 to 5G.

 And last but not least, its very important to have good GC setup:

 export HBASE_OPTS=$HBASE_OPTS -verbose:gc -Xms5000m
 -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps
 -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
 -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
 -XX:+UseParNewGC \
 -XX:NewSize=128m -XX:MaxNewSize=128m \
 -XX:-UseAdaptiveSizePolicy \
 -XX:+CMSParallelRemarkEnabled \
 -XX:-TraceClassUnloading
 

 -Jack

 On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma va...@pinterest.com wrote:
  Hey Jack,
 
  Thanks for the useful information. By flush size being 15 %, do you mean
  the memstore flush size ? 15 % would mean close to 1G, have you seen any
  issues with flushes taking too long ?
 
  Thanks
  Varun
 
  On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin magn...@gmail.com wrote:
 
  That's right, Memstore size , not flush size is increased.  Filesize is
  10G. Overall write cache is 60% of heap and read cache is 20%.  Flush
 size
  is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that
 can
  be promoted.  On the way to hbase images are written to a queue, so
 that we
  can take Hbase down for maintenance and still do inserts later.
  ImageShack
  has ‘perma cache’ servers that allows writes and serving of data even
 when
  hbase is down for hours, consider it 4th replica  outside of hadoop
 
  Jack
 
   *From:* Mohit Anchlia mohitanch...@gmail.com
  *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
  *To:* user@hbase.apache.org
  *Subject:* Re: Storing images in Hbase
 
  Thanks Jack for sharing this information. This definitely makes sense
 when
  using the type of caching layer. You mentioned about increasing write
  cache, I am assuming you had to increase the following parameters in
  addition to increase the memstore size:
 
  hbase.hregion.max.filesize
  hbase.hregion.memstore.flush.size
 
  On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin magn...@gmail.com wrote:
 
   We buffer all accesses to HBASE with Varnish SSD based caching layer.
   So the impact for reads is negligible.  We have 70 node cluster, 8 GB
   of RAM per node, relatively weak nodes (intel core 2 duo), with
   10-12TB per server of disks.  Inserting 600,000 images per day.  We
   have relatively little of compaction activity as we made our write
   cache much larger than read cache - so we don't experience region file
   fragmentation as much.
  
   -Jack
  
   On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
I think it really depends on volume of the traffic, data
 distribution
  per
region, how and when files compaction occurs, number of nodes in the
cluster. In my experience when it comes to blob data where you are
   serving
10s of thousand+ requests/sec writes and reads then it's very
 difficult
   to
manage HBase without very hard operations and maintenance in play.
 Jack
earlier mentioned they have 1 billion images, It would be
 interesting
  to
know what they see in terms of compaction, no of requests per sec.
 I'd
  be
surprised that in high volume site it can be done without any
 Caching
   layer
on the top to alleviate IO spikes that occurs because of GC and
   compactions.
   
On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq donta...@gmail.com
 
   wrote:
   
IMHO, if the image files are not too huge, Hbase can efficiently
 serve
   the
purpose. You can store some additional info along with the file
   depending
upon your search criteria to make the search faster. Say if you
 want
  to
fetch images by the type, you can store images in one column and
 its
extension in another column(jpg, tiff etc).
   
BTW, what exactly is the problem which you are facing. You have
  written
But I still cant do it?
   
Warm Regards,
   

Re: Storing images in Hbase

2013-01-21 Thread Varun Sharma
On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma va...@pinterest.com wrote:

 Thanks for the useful information. I wonder why you use only 5G heap when
 you have an 8G machine ? Is there a reason to not use all of it (the
 DataNode typically takes a 1G of RAM)


 On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin magn...@gmail.com wrote:

 I forgot to mention that I also have this setup:

 property
   namehbase.hregion.memstore.flush.size/name
   value33554432/value
   descriptionFlush more often. Default: 67108864/description
 /property

 This parameter works on per region amount, so this means if any of my
 400 (currently) regions on a regionserver has 30MB+ in memstore, the
 hbase will flush it to disk.


 Here are some metrics from a regionserver:

 requests=2, regions=370, stores=370, storefiles=1390,
 storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
 flushQueueSize=0, usedHeap=3516, maxHeap=4987,
 blockCacheSize=790656256, blockCacheFree=255245888,
 blockCacheCount=2436, blockCacheHitCount=218015828,
 blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
 blockCacheHitRatio=94, blockCacheHitCachingRatio=98

 Note, that memstore is only 2G, this particular regionserver HEAP is set
 to 5G.

 And last but not least, its very important to have good GC setup:

 export HBASE_OPTS=$HBASE_OPTS -verbose:gc -Xms5000m
 -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps
 -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
 -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
 -XX:+UseParNewGC \
 -XX:NewSize=128m -XX:MaxNewSize=128m \
 -XX:-UseAdaptiveSizePolicy \
 -XX:+CMSParallelRemarkEnabled \
 -XX:-TraceClassUnloading
 

 -Jack

 On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma va...@pinterest.com
 wrote:
  Hey Jack,
 
  Thanks for the useful information. By flush size being 15 %, do you mean
  the memstore flush size ? 15 % would mean close to 1G, have you seen any
  issues with flushes taking too long ?
 
  Thanks
  Varun
 
  On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin magn...@gmail.com wrote:
 
  That's right, Memstore size , not flush size is increased.  Filesize is
  10G. Overall write cache is 60% of heap and read cache is 20%.  Flush
 size
  is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that
 can
  be promoted.  On the way to hbase images are written to a queue, so
 that we
  can take Hbase down for maintenance and still do inserts later.
  ImageShack
  has ‘perma cache’ servers that allows writes and serving of data even
 when
  hbase is down for hours, consider it 4th replica  outside of hadoop
 
  Jack
 
   *From:* Mohit Anchlia mohitanch...@gmail.com
  *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
  *To:* user@hbase.apache.org
  *Subject:* Re: Storing images in Hbase
 
  Thanks Jack for sharing this information. This definitely makes sense
 when
  using the type of caching layer. You mentioned about increasing write
  cache, I am assuming you had to increase the following parameters in
  addition to increase the memstore size:
 
  hbase.hregion.max.filesize
  hbase.hregion.memstore.flush.size
 
  On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin magn...@gmail.com wrote:
 
   We buffer all accesses to HBASE with Varnish SSD based caching layer.
   So the impact for reads is negligible.  We have 70 node cluster, 8 GB
   of RAM per node, relatively weak nodes (intel core 2 duo), with
   10-12TB per server of disks.  Inserting 600,000 images per day.  We
   have relatively little of compaction activity as we made our write
   cache much larger than read cache - so we don't experience region
 file
   fragmentation as much.
  
   -Jack
  
   On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
I think it really depends on volume of the traffic, data
 distribution
  per
region, how and when files compaction occurs, number of nodes in
 the
cluster. In my experience when it comes to blob data where you are
   serving
10s of thousand+ requests/sec writes and reads then it's very
 difficult
   to
manage HBase without very hard operations and maintenance in play.
 Jack
earlier mentioned they have 1 billion images, It would be
 interesting
  to
know what they see in terms of compaction, no of requests per sec.
 I'd
  be
surprised that in high volume site it can be done without any
 Caching
   layer
on the top to alleviate IO spikes that occurs because of GC and
   compactions.
   
On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq 
 donta...@gmail.com
   wrote:
   
IMHO, if the image files are not too huge, Hbase can efficiently
 serve
   the
purpose. You can store some additional info along with the file
   depending
upon your search criteria to make the search faster. Say if you
 want
  to
fetch images by the type, you can store images in one column and
 its
extension in another column(jpg, tiff etc).
   
BTW, what exactly is the problem which you 

Re: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas Liochon

2013-01-21 Thread lars hofhansl
BTW, here's a list with all current PMC members:

http://people.apache.org/committers-by-project.html#hbase-pmc




 From: Jonathan Hsieh j...@cloudera.com
To: user@hbase.apache.org; d...@hbase.apache.org 
Sent: Monday, January 21, 2013 12:56 PM
Subject: [ANNOUNCE] New Apache HBase PMC members: Jimmy Xiang and Nicolas 
Liochon
 
On behalf of the Apache HBase PMC, I am excited to welcome Jimmy Xiang
and Nicholas Liochon as members of the Apache HBase PMC.

* Jimmy (jxiang) has been one of the drivers on the RPC protobuf'ing
efforts, several hbck repairs, and the current  revamp of the
assignment manager.
* Nicolas (nkeywal) has been a one of the drivers of unit testing
categorization and work improving the mean time to recovery in the
face of partial failure.

Please join me in congratulating Jimmy and Nicholas on their new roles!

Jon.

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// j...@cloudera.com

Re: What does Found lingering reference file mean?

2013-01-21 Thread Stack
Thats right.  Its a bug in hbck that it thinks recovered.edits a cf.
St.Ack


On Mon, Jan 21, 2013 at 4:03 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Ok, so basically, there was no issues with my tables? I did not used
 any specific keywors for my CF... They are all called @ or A ;)

 2013/1/21, Stack st...@duboce.net:
  On Mon, Jan 21, 2013 at 2:45 PM, Jimmy Xiang jxi...@cloudera.com
 wrote:
 
  RECOVERED_EDITS is not a column family.  It should be ignored by hbck.
 
  Filed a jira:
 
  https://issues.apache.org/jira/browse/HBASE-764
 https://issues.apache.org/jira/browse/HBASE-7640
 
 
  Thanks Jimmy.  That makes sense now you mention it (smile).
  St.Ack