Re: auto flush

2012-04-04 Thread Ferdy Galema
Hi Eric,

You are right on both issues. There should be an option to turn of
autoflushing when creating the HBaseStore. I think the gora properties file
is a nice place for this option.

And indeed a flush should flush all threadlocal instances. I implemented
the threadlocal code for the HBaseStore, because at the time there was no
synchronization at all for the HBaseStore. The solution for the flushing
problem would be to use a readwrite lock so that all threads can use their
HTable instance freely (read-lock) except when flushing, because that will
block actions until the flush is done (write-lock). That way all HTable
instances will get flushed. A nicer solution would be to see if there are
other (best-practices) ways to implement multithreading HBase client code.

I will shortly file Jira issues and patch them.

Ferdy.

On Wed, Apr 4, 2012 at 2:45 AM, Eric Newton eric.new...@gmail.com wrote:

 Hi Lewis,

 I changed gora-hbase in my local copy to always turn off auto flush, and
 then I flush as-needed.  It looks as though I might need to make sure that
 I flush with the same thread I write with, since it is using thread-local
 storage to pick up the correct client interface.  I don't see that as a
 major problem, it's just a surprising result of the way the connection was
 implemented.  I could be wrong, though.  I only spent about 10 minutes
 looking into it, in order to figure out why HBase was slower than I
 expected.

 -Eric

 On Tue, Apr 3, 2012 at 4:23 PM, Lewis John Mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

  Hi Eric,
 
  On Tue, Apr 3, 2012 at 4:04 PM, Eric Newton eric.new...@gmail.com
 wrote:
 
   Is there any particular reason that gora-hbase uses auto flush on every
   HTable connection?
  
 
  I'm not using the HBase module at the moment so will wait for others to
  chime in, however maybe you could comment on an alternative
 implementation
  (or simply to remove auto flush making the flush configurable instead?) I
  would be really interested to hear your comments.
 
 
  
   This makes Keith Turner's excellent goraci test run very slowly for
  hbase.
  
 
  Well it's something we should defo look into then. Hopefully we can
  actually get this test suite integrated into our CI build soon, until
 then
  thanks for pointing out the problem.
 
 
  
   Also, I was unable to subscribe using user-subscr...@gora.apache.org.
  
 
  Yeah currently we don't have user list, all traffic has been coming
 through
  dev@. I was actually going to wait until after our next release before
  getting user@ sorted as most of the work going on has been development
  since graduation. I'll progress with logging an issue with INFRA though,
  thanks for pointing this out.
 
  Lewis
 



Re: auto flush

2012-04-04 Thread Lewis John Mcgibbney
Thanks for dropping in here Ferdy.

On Wed, Apr 4, 2012 at 9:34 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote:

 There should be an option to turn of
 autoflushing when creating the HBaseStore. I think the gora properties file
 is a nice place for this option.


+1



 The solution for the flushing
 problem would be to use a readwrite lock so that all threads can use their
 HTable instance freely (read-lock) except when flushing, because that will
 block actions until the flush is done (write-lock). That way all HTable
 instances will get flushed.


Sounds reasonable, but yeah I'm all for doing a bit of investigation into
best practice for this as you mention below.


 A nicer solution would be to see if there are
 other (best-practices) ways to implement multithreading HBase client code.


Lewis


Re: Republish Gora trunk Javadoc

2012-04-04 Thread Lewis John Mcgibbney
Aye it does. I'll commit this today and write our report as well.

Ta Chris.

lewis

On Wed, Apr 4, 2012 at 2:31 AM, Mattmann, Chris A (388J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Hey Lewis,

 I think you can do:

 1. mvn javadoc:aggregate (from top level)
 2. cp -R target/site/apidocs ../site/publish/apidocs-X.Y
 3. cd ../site/publish; svn commit -m ...

 Make sense?

 Cheers,
 Chris

 On Apr 3, 2012, at 3:09 PM, Lewis John Mcgibbney wrote:

  Hi Guys,
 
  I've published site documentation and have included my experiences of
 doing
  so on our wiki, but would like to republish the Javadoc as there have
 been
  some recent commits that I would like to get pushed for others to view
 via
  Javadoc. Can anyone provide details of how I got about doing this?
 
  Thanks
 
  Lewis
 
  --
  *Lewis*


 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++




-- 
*Lewis*


Creation of user@ lists

2012-04-04 Thread Lewis John Mcgibbney
Hi Guys,

I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the
question came upregarding PMC consensus for

1) Creating the list, and
2) 2-3 moderators (one of which I will fill)

So... firstly, do we actually want a user@ list? I didn't realise that we
required PMC consensus, but it makes sense as it might be more mail for
people, and also I suppose that our dev list isn't so busy just now so is
it actually required? I just thought it rather odd that AFAIK Gora seems to
be the only TLP with no user@ list. Finally, if we do want a list do we
have at least one other list moderator?

Thanks

Lewis

-- 
*Lewis*


Re: Jenkins build became unstable: gora-trunk ยป Apache Gora :: Hbase #206

2012-04-04 Thread Ferdy Galema
Hi Lewis,

I'm not sure what the problem with your Gora build is, but I hope this
helps somewhat: Generally I don't have the problem of hanging builds when
running mvn clean test. And I just ran the tests in head and there does
not seem to be a problem. Also test run without failures.

Ferdy.

On Tue, Apr 3, 2012 at 12:52 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Guys,

 I think HBase tests are quite literally all over the place just now. I'll
 try to have a crack at them later this afternoon but it appears that quite
 often there is some dodgy outcomes from running the test suite against the
 hbase module.

 I can also confirm that my Gora build appears to hang when I build
 locally... does anyone else experience this?

 Thanks

 Lewis

 On Tue, Apr 3, 2012 at 6:13 AM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:

  See 
  https://builds.apache.org/job/gora-trunk/org.apache.gora$gora-hbase/206/
 
 
 


 --
 *Lewis*



Re: Creation of user@ lists

2012-04-04 Thread Mattmann, Chris A (388J)
Hi Lewis,

+1 from me. I'm happy to moderate emails, you can add me to the list :)

Cheers,
Chris

On Apr 4, 2012, at 4:37 AM, Lewis John Mcgibbney wrote:

 Hi Guys,
 
 I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the
 question came upregarding PMC consensus for
 
 1) Creating the list, and
 2) 2-3 moderators (one of which I will fill)
 
 So... firstly, do we actually want a user@ list? I didn't realise that we
 required PMC consensus, but it makes sense as it might be more mail for
 people, and also I suppose that our dev list isn't so busy just now so is
 it actually required? I just thought it rather odd that AFAIK Gora seems to
 be the only TLP with no user@ list. Finally, if we do want a list do we
 have at least one other list moderator?
 
 Thanks
 
 Lewis
 
 -- 
 *Lewis*


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Creation of user@ lists

2012-04-04 Thread Keith Turner
On Wed, Apr 4, 2012 at 7:37 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Guys,

 I just opened https://issues.apache.org/jira/browse/INFRA-4649 and the
 question came upregarding PMC consensus for

 1) Creating the list, and
 2) 2-3 moderators (one of which I will fill)

 So... firstly, do we actually want a user@ list? I didn't realise that we

I think there should be a user list if you want users.

 required PMC consensus, but it makes sense as it might be more mail for
 people, and also I suppose that our dev list isn't so busy just now so is
 it actually required? I just thought it rather odd that AFAIK Gora seems to
 be the only TLP with no user@ list. Finally, if we do want a list do we
 have at least one other list moderator?

 Thanks

 Lewis

 --
 *Lewis*


Re: Republish Gora trunk Javadoc

2012-04-04 Thread Mattmann, Chris A (388J)
Thanks d00d!

Cheers,
Chris

On Apr 4, 2012, at 4:32 AM, Lewis John Mcgibbney wrote:

 Aye it does. I'll commit this today and write our report as well.
 
 Ta Chris.
 
 lewis
 
 On Wed, Apr 4, 2012 at 2:31 AM, Mattmann, Chris A (388J) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 Hey Lewis,
 
 I think you can do:
 
 1. mvn javadoc:aggregate (from top level)
 2. cp -R target/site/apidocs ../site/publish/apidocs-X.Y
 3. cd ../site/publish; svn commit -m ...
 
 Make sense?
 
 Cheers,
 Chris
 
 On Apr 3, 2012, at 3:09 PM, Lewis John Mcgibbney wrote:
 
 Hi Guys,
 
 I've published site documentation and have included my experiences of
 doing
 so on our wiki, but would like to republish the Javadoc as there have
 been
 some recent commits that I would like to get pushed for others to view
 via
 Javadoc. Can anyone provide details of how I got about doing this?
 
 Thanks
 
 Lewis
 
 --
 *Lewis*
 
 
 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 -- 
 *Lewis*


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: DRAFT GORA REPORT

2012-04-04 Thread Mattmann, Chris A (388J)
Super +1.

Great report dude.

Cheers,
Chris

On Apr 4, 2012, at 11:50 AM, Lewis John Mcgibbney wrote:

 Hi Everyone,
 
 Please see below for a draft report. I'll send this in tomorrow unless
 there are objections or anything to add.
 
 Thanks
 
 Lewis
 
 Apache Gora
 
 The Apache Gora open source framework provides an in-memory data model and
 persistence for big data. Gora supports persisting to column stores, key
 value stores, document stores and RDBMSs, and analyzing the data with
 extensive Apache Hadoop MapReduce support.
 
 Project Releases
 
 The last official project release was made on 24/09/2011 which was the
 0.1.1-incubating release (2nd whilst in the Incubator). Since last
 reporting there have been few commits but the ones we've seen
 have been fairly significant, but still 4 issues to
 be addressed before we can progress to a 0.2 release candidate.
 Major issues to be addressed include implementing tests for the
 gora-cassandra
 module, an upgrade to Hadoop 1.0.0.
 
 Overall Project Activity since last report
 
 Activity roughly shadows last months average, with nothing exceptional
 taking place.
 A blocker issue with our usage of a particular sql library has been dealt
 with,
 additionally Keith Turner was able to commit his gora-accumulo module, as
 the
 distribution of Accumulo was released and available for us to use. Ferdy
 committed a nice piece of work which now provides users with the ability to
 properly support multiple data store implementations in parallel. We've also
 seen keen interest for our proposed GSoC project which is to add a
 gora-Amazon
 DyanmoDB module to the project and look forward to picking up traction with
 this in the near future.
 
 How has the community developed since the last report?
 
 We recently received (rather encouragingly) that someone struggled to join
 the user@ list. This was because this list did not exist, it has however
 now been
 created. We've had some questions coming into the project regarding the
 hbase
 module, and whether or not we were going to support certain features within
 Gora,
 however unfortunately none of these issues lead to any commits from outside
 the
 existing community.
 
 Changes to PMC  Committers
 
 NONE
 
 PMC and Committer diversity
 
 We currently have committers from a wide variety of projects including,
 Nutch, Tika, OODT, Camel, Solr, Accumulo  Hadoop (this is not an exhaustive
 list). There is work to be done with the Avro implementations, so once we
 are 100% ready to work on these issues, we will be looking to interest
 members of the Avro community in Gora. It would also be nice to attract
 members of the Hector and Cassandra community so we will work towards this
 goal.
 
 Project Branding or Naming issues
 
 NONE
 
 Legal issues
 
 NONE
 
 -- 
 *Lewis*


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: DRAFT GORA REPORT

2012-04-04 Thread Henry Saputra
+1 Lewis

Thanks

- Henry

On Wed, Apr 4, 2012 at 11:50 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Everyone,

 Please see below for a draft report. I'll send this in tomorrow unless
 there are objections or anything to add.

 Thanks

 Lewis

 Apache Gora

 The Apache Gora open source framework provides an in-memory data model and
 persistence for big data. Gora supports persisting to column stores, key
 value stores, document stores and RDBMSs, and analyzing the data with
 extensive Apache Hadoop MapReduce support.

 Project Releases

 The last official project release was made on 24/09/2011 which was the
 0.1.1-incubating release (2nd whilst in the Incubator). Since last
 reporting there have been few commits but the ones we've seen
 have been fairly significant, but still 4 issues to
 be addressed before we can progress to a 0.2 release candidate.
 Major issues to be addressed include implementing tests for the
 gora-cassandra
 module, an upgrade to Hadoop 1.0.0.

 Overall Project Activity since last report

 Activity roughly shadows last months average, with nothing exceptional
 taking place.
 A blocker issue with our usage of a particular sql library has been dealt
 with,
 additionally Keith Turner was able to commit his gora-accumulo module, as
 the
 distribution of Accumulo was released and available for us to use. Ferdy
 committed a nice piece of work which now provides users with the ability to
 properly support multiple data store implementations in parallel. We've also
 seen keen interest for our proposed GSoC project which is to add a
 gora-Amazon
 DyanmoDB module to the project and look forward to picking up traction with
 this in the near future.

 How has the community developed since the last report?

 We recently received (rather encouragingly) that someone struggled to join
 the user@ list. This was because this list did not exist, it has however
 now been
 created. We've had some questions coming into the project regarding the
 hbase
 module, and whether or not we were going to support certain features within
 Gora,
 however unfortunately none of these issues lead to any commits from outside
 the
 existing community.

 Changes to PMC  Committers

 NONE

 PMC and Committer diversity

 We currently have committers from a wide variety of projects including,
 Nutch, Tika, OODT, Camel, Solr, Accumulo  Hadoop (this is not an exhaustive
 list). There is work to be done with the Avro implementations, so once we
 are 100% ready to work on these issues, we will be looking to interest
 members of the Avro community in Gora. It would also be nice to attract
 members of the Hector and Cassandra community so we will work towards this
 goal.

 Project Branding or Naming issues

 NONE

 Legal issues

 NONE

 --
 *Lewis*