Re: HBase and unit tests
Hi Cristopher, HBase starts a minicluster for many of its tests because we have a lot of destructive tests. Or the non destructive tests would be impacted by the destructive tests. When writing a client application, you usually don't need to do that: you can rely on the same instance for all your tests. As well, it's useful to write the tests in a way compatible with a real cluster or a pseudo distributed one. Sometimes, when the test fails, you want to have a look at what the code wrote or found in HBase: you won't have this in a mini cluster. And it saves a start. I don't know if there is a blog entry on this; but it's not very difficult to do (but as usual not that easy when you start). I've personally done it with a singleton class + prefixing the table names by a random key (to allow multiple tests in parallel on the same cluster without relying on cleanup) + getProperty to decide between starting a mini cluster or connecting to a cluster. HTH, Nicolas On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi Sonal, Stack and Ulrich! Yes, I should provide more details :$ I reached the links you provided when I was searching for a way to start HBase with JUnit. From default, the only params I have changed are Zookeeper port and the amount of nodes, which is 1 in my case. Based on logs I suspect that most of time are spent with HDFS and that's why I asked if there is a way to start a standalone instance of HBase. The amount of data written at each test case would probably fit in memstore anyway, and table cleansing between each test method is managed by a loop of deletes. At least 15 seconds are spent on starting the mini cluster for each test case. Right now I reminded that I should turn off WAL when running unit tests :-), but this will not reflect on startup time. Thanks!! Best regards, Cristofer De: Ulrich Staudinger [ustaudin...@gmail.com] Enviado: sexta-feira, 31 de agosto de 2012 2:21 Para: user@hbase.apache.org Assunto: Re: HBase and unit tests As a general advice, although you probably do take care of this, instantiate the mini cluster only once in your junit test constructor and not in every test method. at the end of each test, either cleanup your hbase or use a different area per test. best regards, ulrich -- connect on xing or linkedin. sent from my tablet. On 31.08.2012, at 06:46, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? How much is 'too much time' Cristofer? Do you want a standalone cluster at all? St.Ack P.S. If digging in this area, you might find the blog post by the sematextians of use: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
Re: HBase and unit tests
Hi Cristofer, At least 15 seconds are spent on starting the mini cluster for each test case. and you are sure that you are reusing your mini cluster across unit tests? HTH2, Ulrich On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi Sonal, Stack and Ulrich! Yes, I should provide more details :$ I reached the links you provided when I was searching for a way to start HBase with JUnit. From default, the only params I have changed are Zookeeper port and the amount of nodes, which is 1 in my case. Based on logs I suspect that most of time are spent with HDFS and that's why I asked if there is a way to start a standalone instance of HBase. The amount of data written at each test case would probably fit in memstore anyway, and table cleansing between each test method is managed by a loop of deletes. At least 15 seconds are spent on starting the mini cluster for each test case. Right now I reminded that I should turn off WAL when running unit tests :-), but this will not reflect on startup time. Thanks!! Best regards, Cristofer De: Ulrich Staudinger [ustaudin...@gmail.com] Enviado: sexta-feira, 31 de agosto de 2012 2:21 Para: user@hbase.apache.org Assunto: Re: HBase and unit tests As a general advice, although you probably do take care of this, instantiate the mini cluster only once in your junit test constructor and not in every test method. at the end of each test, either cleanup your hbase or use a different area per test. best regards, ulrich -- connect on xing or linkedin. sent from my tablet. On 31.08.2012, at 06:46, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? How much is 'too much time' Cristofer? Do you want a standalone cluster at all? St.Ack P.S. If digging in this area, you might find the blog post by the sematextians of use: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/ -- Ulrich Staudinger http://goog_958005736http://www.activequant.com Connect online: https://www.xing.com/profile/Ulrich_Staudinger
Re: HBase and unit tests
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: For the other adapters (Cassandra, Cassandra + Thrift, Cassandra + Astyanax, etc) they managed to run tests as Internal and External for unit tests and also have a profile for Performance and Concurrent tests, where External and Performance/Concurrent runs over a live database instance and only with Internal tests it is expected to start a database per test case, remaining the same tests as in External. HBase adapter already have External and Performance/Concurrent so I'm trying to provide the Internal set where the objective is to test Titan|HBase interaction. Understood, thanks for sharing the context. And my goal is to achieve better times than Cassandra :-) Singleton seems to be a good option, but I have to check if Maven Surefire can keep same process between JUnit Test Cases. It should be ok with the parameter forkMode=once in surefire. Because Titan work with adapters for different databases and manage table/CF creation when not exists, I think it will not be possible to prefix table names per test without changing some core components of Titan, and it seems to be too invasive to change this now, and deletion is fast enough so we can keep same table. It's useful on an external cluster, as you can't fully rely on the clean up when a test fails nastily, or if you want to analyse the content. It won't be such an issue on a mini cluster, as it's recreated between the test runs. Thanks!! You're welcome. Keep us updated, and tell us if you have issues.
Re: HBase and unit tests
Hi Cristofer, Are you using some specific configs with the MiniCluster? We are using HBaseTestingUtility and have a reasonable test run time. You can check https://github.com/sonalgoyal/crux/blob/master/src/test/java/co/nubetech/crux/server/TestHBaseFacade.java Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 31, 2012 at 5:14 AM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? Thanks!! Best regards, Cristofer
Re: HBase and unit tests
On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? How much is 'too much time' Cristofer? Do you want a standalone cluster at all? St.Ack P.S. If digging in this area, you might find the blog post by the sematextians of use: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
Re: HBase and unit tests
As a general advice, although you probably do take care of this, instantiate the mini cluster only once in your junit test constructor and not in every test method. at the end of each test, either cleanup your hbase or use a different area per test. best regards, ulrich -- connect on xing or linkedin. sent from my tablet. On 31.08.2012, at 06:46, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? How much is 'too much time' Cristofer? Do you want a standalone cluster at all? St.Ack P.S. If digging in this area, you might find the blog post by the sematextians of use: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/