Re: auto flush
Hi Eric, You are right on both issues. There should be an option to turn of autoflushing when creating the HBaseStore. I think the gora properties file is a nice place for this option. And indeed a flush should flush all threadlocal instances. I implemented the threadlocal code for the HBaseStore, because at the time there was no synchronization at all for the HBaseStore. The solution for the flushing problem would be to use a readwrite lock so that all threads can use their HTable instance freely (read-lock) except when flushing, because that will block actions until the flush is done (write-lock). That way all HTable instances will get flushed. A nicer solution would be to see if there are other (best-practices) ways to implement multithreading HBase client code. I will shortly file Jira issues and patch them. Ferdy. On Wed, Apr 4, 2012 at 2:45 AM, Eric Newton eric.new...@gmail.com wrote: Hi Lewis, I changed gora-hbase in my local copy to always turn off auto flush, and then I flush as-needed. It looks as though I might need to make sure that I flush with the same thread I write with, since it is using thread-local storage to pick up the correct client interface. I don't see that as a major problem, it's just a surprising result of the way the connection was implemented. I could be wrong, though. I only spent about 10 minutes looking into it, in order to figure out why HBase was slower than I expected. -Eric On Tue, Apr 3, 2012 at 4:23 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Eric, On Tue, Apr 3, 2012 at 4:04 PM, Eric Newton eric.new...@gmail.com wrote: Is there any particular reason that gora-hbase uses auto flush on every HTable connection? I'm not using the HBase module at the moment so will wait for others to chime in, however maybe you could comment on an alternative implementation (or simply to remove auto flush making the flush configurable instead?) I would be really interested to hear your comments. This makes Keith Turner's excellent goraci test run very slowly for hbase. Well it's something we should defo look into then. Hopefully we can actually get this test suite integrated into our CI build soon, until then thanks for pointing out the problem. Also, I was unable to subscribe using user-subscr...@gora.apache.org. Yeah currently we don't have user list, all traffic has been coming through dev@. I was actually going to wait until after our next release before getting user@ sorted as most of the work going on has been development since graduation. I'll progress with logging an issue with INFRA though, thanks for pointing this out. Lewis
Re: auto flush
Thanks for dropping in here Ferdy. On Wed, Apr 4, 2012 at 9:34 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote: There should be an option to turn of autoflushing when creating the HBaseStore. I think the gora properties file is a nice place for this option. +1 The solution for the flushing problem would be to use a readwrite lock so that all threads can use their HTable instance freely (read-lock) except when flushing, because that will block actions until the flush is done (write-lock). That way all HTable instances will get flushed. Sounds reasonable, but yeah I'm all for doing a bit of investigation into best practice for this as you mention below. A nicer solution would be to see if there are other (best-practices) ways to implement multithreading HBase client code. Lewis
Re: Republish Gora trunk Javadoc
Aye it does. I'll commit this today and write our report as well. Ta Chris. lewis On Wed, Apr 4, 2012 at 2:31 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Lewis, I think you can do: 1. mvn javadoc:aggregate (from top level) 2. cp -R target/site/apidocs ../site/publish/apidocs-X.Y 3. cd ../site/publish; svn commit -m ... Make sense? Cheers, Chris On Apr 3, 2012, at 3:09 PM, Lewis John Mcgibbney wrote: Hi Guys, I've published site documentation and have included my experiences of doing so on our wiki, but would like to republish the Javadoc as there have been some recent commits that I would like to get pushed for others to view via Javadoc. Can anyone provide details of how I got about doing this? Thanks Lewis -- *Lewis* ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- *Lewis*
Creation of user@ lists
Hi Guys, I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the question came upregarding PMC consensus for 1) Creating the list, and 2) 2-3 moderators (one of which I will fill) So... firstly, do we actually want a user@ list? I didn't realise that we required PMC consensus, but it makes sense as it might be more mail for people, and also I suppose that our dev list isn't so busy just now so is it actually required? I just thought it rather odd that AFAIK Gora seems to be the only TLP with no user@ list. Finally, if we do want a list do we have at least one other list moderator? Thanks Lewis -- *Lewis*
Re: Jenkins build became unstable: gora-trunk ยป Apache Gora :: Hbase #206
Hi Lewis, I'm not sure what the problem with your Gora build is, but I hope this helps somewhat: Generally I don't have the problem of hanging builds when running mvn clean test. And I just ran the tests in head and there does not seem to be a problem. Also test run without failures. Ferdy. On Tue, Apr 3, 2012 at 12:52 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Guys, I think HBase tests are quite literally all over the place just now. I'll try to have a crack at them later this afternoon but it appears that quite often there is some dodgy outcomes from running the test suite against the hbase module. I can also confirm that my Gora build appears to hang when I build locally... does anyone else experience this? Thanks Lewis On Tue, Apr 3, 2012 at 6:13 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/gora-trunk/org.apache.gora$gora-hbase/206/ -- *Lewis*
Re: Creation of user@ lists
Hi Lewis, +1 from me. I'm happy to moderate emails, you can add me to the list :) Cheers, Chris On Apr 4, 2012, at 4:37 AM, Lewis John Mcgibbney wrote: Hi Guys, I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the question came upregarding PMC consensus for 1) Creating the list, and 2) 2-3 moderators (one of which I will fill) So... firstly, do we actually want a user@ list? I didn't realise that we required PMC consensus, but it makes sense as it might be more mail for people, and also I suppose that our dev list isn't so busy just now so is it actually required? I just thought it rather odd that AFAIK Gora seems to be the only TLP with no user@ list. Finally, if we do want a list do we have at least one other list moderator? Thanks Lewis -- *Lewis* ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Creation of user@ lists
On Wed, Apr 4, 2012 at 7:37 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Guys, I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the question came upregarding PMC consensus for 1) Creating the list, and 2) 2-3 moderators (one of which I will fill) So... firstly, do we actually want a user@ list? I didn't realise that we I think there should be a user list if you want users. required PMC consensus, but it makes sense as it might be more mail for people, and also I suppose that our dev list isn't so busy just now so is it actually required? I just thought it rather odd that AFAIK Gora seems to be the only TLP with no user@ list. Finally, if we do want a list do we have at least one other list moderator? Thanks Lewis -- *Lewis*
Re: Republish Gora trunk Javadoc
Thanks d00d! Cheers, Chris On Apr 4, 2012, at 4:32 AM, Lewis John Mcgibbney wrote: Aye it does. I'll commit this today and write our report as well. Ta Chris. lewis On Wed, Apr 4, 2012 at 2:31 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Lewis, I think you can do: 1. mvn javadoc:aggregate (from top level) 2. cp -R target/site/apidocs ../site/publish/apidocs-X.Y 3. cd ../site/publish; svn commit -m ... Make sense? Cheers, Chris On Apr 3, 2012, at 3:09 PM, Lewis John Mcgibbney wrote: Hi Guys, I've published site documentation and have included my experiences of doing so on our wiki, but would like to republish the Javadoc as there have been some recent commits that I would like to get pushed for others to view via Javadoc. Can anyone provide details of how I got about doing this? Thanks Lewis -- *Lewis* ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- *Lewis* ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: DRAFT GORA REPORT
Super +1. Great report dude. Cheers, Chris On Apr 4, 2012, at 11:50 AM, Lewis John Mcgibbney wrote: Hi Everyone, Please see below for a draft report. I'll send this in tomorrow unless there are objections or anything to add. Thanks Lewis Apache Gora The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support. Project Releases The last official project release was made on 24/09/2011 which was the 0.1.1-incubating release (2nd whilst in the Incubator). Since last reporting there have been few commits but the ones we've seen have been fairly significant, but still 4 issues to be addressed before we can progress to a 0.2 release candidate. Major issues to be addressed include implementing tests for the gora-cassandra module, an upgrade to Hadoop 1.0.0. Overall Project Activity since last report Activity roughly shadows last months average, with nothing exceptional taking place. A blocker issue with our usage of a particular sql library has been dealt with, additionally Keith Turner was able to commit his gora-accumulo module, as the distribution of Accumulo was released and available for us to use. Ferdy committed a nice piece of work which now provides users with the ability to properly support multiple data store implementations in parallel. We've also seen keen interest for our proposed GSoC project which is to add a gora-Amazon DyanmoDB module to the project and look forward to picking up traction with this in the near future. How has the community developed since the last report? We recently received (rather encouragingly) that someone struggled to join the user@ list. This was because this list did not exist, it has however now been created. We've had some questions coming into the project regarding the hbase module, and whether or not we were going to support certain features within Gora, however unfortunately none of these issues lead to any commits from outside the existing community. Changes to PMC Committers NONE PMC and Committer diversity We currently have committers from a wide variety of projects including, Nutch, Tika, OODT, Camel, Solr, Accumulo Hadoop (this is not an exhaustive list). There is work to be done with the Avro implementations, so once we are 100% ready to work on these issues, we will be looking to interest members of the Avro community in Gora. It would also be nice to attract members of the Hector and Cassandra community so we will work towards this goal. Project Branding or Naming issues NONE Legal issues NONE -- *Lewis* ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: DRAFT GORA REPORT
+1 Lewis Thanks - Henry On Wed, Apr 4, 2012 at 11:50 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Everyone, Please see below for a draft report. I'll send this in tomorrow unless there are objections or anything to add. Thanks Lewis Apache Gora The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support. Project Releases The last official project release was made on 24/09/2011 which was the 0.1.1-incubating release (2nd whilst in the Incubator). Since last reporting there have been few commits but the ones we've seen have been fairly significant, but still 4 issues to be addressed before we can progress to a 0.2 release candidate. Major issues to be addressed include implementing tests for the gora-cassandra module, an upgrade to Hadoop 1.0.0. Overall Project Activity since last report Activity roughly shadows last months average, with nothing exceptional taking place. A blocker issue with our usage of a particular sql library has been dealt with, additionally Keith Turner was able to commit his gora-accumulo module, as the distribution of Accumulo was released and available for us to use. Ferdy committed a nice piece of work which now provides users with the ability to properly support multiple data store implementations in parallel. We've also seen keen interest for our proposed GSoC project which is to add a gora-Amazon DyanmoDB module to the project and look forward to picking up traction with this in the near future. How has the community developed since the last report? We recently received (rather encouragingly) that someone struggled to join the user@ list. This was because this list did not exist, it has however now been created. We've had some questions coming into the project regarding the hbase module, and whether or not we were going to support certain features within Gora, however unfortunately none of these issues lead to any commits from outside the existing community. Changes to PMC Committers NONE PMC and Committer diversity We currently have committers from a wide variety of projects including, Nutch, Tika, OODT, Camel, Solr, Accumulo Hadoop (this is not an exhaustive list). There is work to be done with the Avro implementations, so once we are 100% ready to work on these issues, we will be looking to interest members of the Avro community in Gora. It would also be nice to attract members of the Hector and Cassandra community so we will work towards this goal. Project Branding or Naming issues NONE Legal issues NONE -- *Lewis*