Re: HBase and unit tests

2012-08-31 Thread n keywal
Hi Cristopher,

HBase starts a minicluster for many of its tests because we have a lot of
destructive tests. Or the non destructive tests would be impacted by the
destructive tests. When writing a client application, you usually don't
need to do that: you can rely on the same instance for all your tests.

As well, it's useful to write the tests in a way compatible with a real
cluster or a pseudo distributed one. Sometimes, when the test fails, you
want to have a look at what the code wrote or found in HBase: you won't
have this in a mini cluster. And it saves a start.

I don't know if there is a blog entry on this; but it's not very difficult
to do (but as usual not that easy when you start). I've personally done it
with a singleton class + prefixing the table names by a random key (to
allow multiple tests in parallel on the same cluster without relying on
cleanup) + getProperty to decide between starting a mini cluster or
connecting to a cluster.

HTH,

Nicolas


On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 Hi Sonal, Stack and Ulrich!

 Yes, I should provide more details :$

 I reached the links you provided when I was searching for a way to start
 HBase with JUnit. From default, the only params I have changed are
 Zookeeper port and the amount of nodes, which is 1 in my case. Based on
 logs I suspect that most of time are spent with HDFS and that's why I asked
 if there is a way to start a standalone instance of HBase. The amount of
 data written at each test case would probably fit in memstore anyway, and
 table cleansing between each test method is managed by a loop of deletes.

 At least 15 seconds are spent on starting the mini cluster for each test
 case.

 Right now I reminded that I should turn off WAL when running unit tests
 :-), but this will not reflect on startup time.

 Thanks!!

 Best regards,
 Cristofer

 
 De: Ulrich Staudinger [ustaudin...@gmail.com]
 Enviado: sexta-feira, 31 de agosto de 2012 2:21
 Para: user@hbase.apache.org
 Assunto: Re: HBase and unit tests

 As a general advice, although you probably do take care of this,
 instantiate the mini cluster only once in your junit test constructor
 and not in every test method. at the end of each test, either cleanup
 your hbase or use a different area per test.

 best regards,
 ulrich


 --
 connect on xing or linkedin. sent from my tablet.

 On 31.08.2012, at 06:46, Stack st...@duboce.net wrote:

  On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
  cristofer.we...@neogrid.com wrote:
  Hi there!
 
  After I started studying HBase, I've searched for open source projects
 backed by HBase and I found Titan distributed graph database (you probably
 heard about it). As soon as I read in their documentation that HBase
 adapter is experimental and suboptimal (disclaimer here:
 https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to
 help improving this adapter and since then I made a few changes to improve
 on running tests (reduced from hours to minutes) and also an improvement on
 search feature.
 
  Now I'm trying to break the dependency on a pre-installed HBase for
 unit tests and found miniCluster inside HBase tests, but minicluster
 demands too much time to start and I don't know if tweaking on configs will
 improve significantly. Is there a way to start a 'lightweight' instance,
 like programatically starting a standalone instance?
 
 
  How much is 'too much time' Cristofer?  Do you want a standalone cluster
 at all?
  St.Ack
  P.S. If digging in this area, you might find the blog post by the
  sematextians of use:
 
 http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/



Re: HBase and unit tests

2012-08-31 Thread Ulrich Staudinger
Hi Cristofer,

 At least 15 seconds are spent on starting the mini cluster for each test
case.

and you are sure that you are reusing your mini cluster across unit tests?


HTH2,
Ulrich




On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 Hi Sonal, Stack and Ulrich!

 Yes, I should provide more details :$

 I reached the links you provided when I was searching for a way to start
 HBase with JUnit. From default, the only params I have changed are
 Zookeeper port and the amount of nodes, which is 1 in my case. Based on
 logs I suspect that most of time are spent with HDFS and that's why I asked
 if there is a way to start a standalone instance of HBase. The amount of
 data written at each test case would probably fit in memstore anyway, and
 table cleansing between each test method is managed by a loop of deletes.

 At least 15 seconds are spent on starting the mini cluster for each test
 case.

 Right now I reminded that I should turn off WAL when running unit tests
 :-), but this will not reflect on startup time.

 Thanks!!

 Best regards,
 Cristofer

 
 De: Ulrich Staudinger [ustaudin...@gmail.com]
 Enviado: sexta-feira, 31 de agosto de 2012 2:21
 Para: user@hbase.apache.org
 Assunto: Re: HBase and unit tests

 As a general advice, although you probably do take care of this,
 instantiate the mini cluster only once in your junit test constructor
 and not in every test method. at the end of each test, either cleanup
 your hbase or use a different area per test.

 best regards,
 ulrich


 --
 connect on xing or linkedin. sent from my tablet.

 On 31.08.2012, at 06:46, Stack st...@duboce.net wrote:

  On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
  cristofer.we...@neogrid.com wrote:
  Hi there!
 
  After I started studying HBase, I've searched for open source projects
 backed by HBase and I found Titan distributed graph database (you probably
 heard about it). As soon as I read in their documentation that HBase
 adapter is experimental and suboptimal (disclaimer here:
 https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to
 help improving this adapter and since then I made a few changes to improve
 on running tests (reduced from hours to minutes) and also an improvement on
 search feature.
 
  Now I'm trying to break the dependency on a pre-installed HBase for
 unit tests and found miniCluster inside HBase tests, but minicluster
 demands too much time to start and I don't know if tweaking on configs will
 improve significantly. Is there a way to start a 'lightweight' instance,
 like programatically starting a standalone instance?
 
 
  How much is 'too much time' Cristofer?  Do you want a standalone cluster
 at all?
  St.Ack
  P.S. If digging in this area, you might find the blog post by the
  sematextians of use:
 
 http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/




-- 
Ulrich Staudinger

http://goog_958005736http://www.activequant.com
Connect online: https://www.xing.com/profile/Ulrich_Staudinger


Re: HBase and unit tests

2012-08-31 Thread n keywal
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 For the other adapters (Cassandra, Cassandra + Thrift, Cassandra +
 Astyanax, etc) they managed to run tests as Internal and External for unit
 tests and also have a profile for Performance and Concurrent tests, where
 External and Performance/Concurrent runs over a live database instance and
 only with Internal tests it is expected to start a database per test case,
 remaining the same tests as in External. HBase adapter already have
 External and Performance/Concurrent so I'm trying to provide the Internal
 set where the objective is to test Titan|HBase interaction.


Understood, thanks for sharing the context.

And my goal is to achieve better times than Cassandra :-)

 Singleton seems to be a good option, but I have to check if Maven Surefire
 can keep same process between JUnit Test Cases.


It should be ok with the parameter forkMode=once in surefire.

Because Titan work with adapters for different databases and manage
 table/CF creation when not exists, I think it will not be possible to
 prefix table names per test without changing some core components of Titan,
 and it seems to be too invasive to change this now, and deletion is fast
 enough so we can keep same table.


It's useful on an external cluster, as you can't fully rely on the clean up
when a test fails nastily, or if you want to analyse the content. It won't
be such an issue on a mini cluster, as it's recreated between the test runs.

Thanks!!


You're welcome. Keep us updated, and tell us if you have issues.


Re: HBase and unit tests

2012-08-30 Thread Sonal Goyal
Hi Cristofer,

Are you using some specific configs with the MiniCluster?

We are using HBaseTestingUtility and have a reasonable test run time. You
can check
https://github.com/sonalgoyal/crux/blob/master/src/test/java/co/nubetech/crux/server/TestHBaseFacade.java

Best Regards,
Sonal
Crux: Reporting for HBase https://github.com/sonalgoyal/crux
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Fri, Aug 31, 2012 at 5:14 AM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 Hi there!

 After I started studying HBase, I've searched for open source projects
 backed by HBase and I found Titan distributed graph database (you probably
 heard about it). As soon as I read in their documentation that HBase
 adapter is experimental and suboptimal (disclaimer here:
 https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to
 help improving this adapter and since then I made a few changes to improve
 on running tests (reduced from hours to minutes) and also an improvement on
 search feature.

 Now I'm trying to break the dependency on a pre-installed HBase for unit
 tests and found miniCluster inside HBase tests, but minicluster demands too
 much time to start and I don't know if tweaking on configs will improve
 significantly. Is there a way to start a 'lightweight' instance, like
 programatically starting a standalone instance?

 Thanks!!

 Best regards,
 Cristofer


Re: HBase and unit tests

2012-08-30 Thread Stack
On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
cristofer.we...@neogrid.com wrote:
 Hi there!

 After I started studying HBase, I've searched for open source projects backed 
 by HBase and I found Titan distributed graph database (you probably heard 
 about it). As soon as I read in their documentation that HBase adapter is 
 experimental and suboptimal (disclaimer here: 
 https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to 
 help improving this adapter and since then I made a few changes to improve on 
 running tests (reduced from hours to minutes) and also an improvement on 
 search feature.

 Now I'm trying to break the dependency on a pre-installed HBase for unit 
 tests and found miniCluster inside HBase tests, but minicluster demands too 
 much time to start and I don't know if tweaking on configs will improve 
 significantly. Is there a way to start a 'lightweight' instance, like 
 programatically starting a standalone instance?


How much is 'too much time' Cristofer?  Do you want a standalone cluster at all?
St.Ack
P.S. If digging in this area, you might find the blog post by the
sematextians of use:
http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/


Re: HBase and unit tests

2012-08-30 Thread Ulrich Staudinger
As a general advice, although you probably do take care of this,
instantiate the mini cluster only once in your junit test constructor
and not in every test method. at the end of each test, either cleanup
your hbase or use a different area per test.

best regards,
ulrich


--
connect on xing or linkedin. sent from my tablet.

On 31.08.2012, at 06:46, Stack st...@duboce.net wrote:

 On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
 cristofer.we...@neogrid.com wrote:
 Hi there!

 After I started studying HBase, I've searched for open source projects 
 backed by HBase and I found Titan distributed graph database (you probably 
 heard about it). As soon as I read in their documentation that HBase adapter 
 is experimental and suboptimal (disclaimer here: 
 https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to 
 help improving this adapter and since then I made a few changes to improve 
 on running tests (reduced from hours to minutes) and also an improvement on 
 search feature.

 Now I'm trying to break the dependency on a pre-installed HBase for unit 
 tests and found miniCluster inside HBase tests, but minicluster demands too 
 much time to start and I don't know if tweaking on configs will improve 
 significantly. Is there a way to start a 'lightweight' instance, like 
 programatically starting a standalone instance?


 How much is 'too much time' Cristofer?  Do you want a standalone cluster at 
 all?
 St.Ack
 P.S. If digging in this area, you might find the blog post by the
 sematextians of use:
 http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/