Hi Damien, It's really nice there is people out there helping out with the Gora MapReduce stuff, thanks! (: I think you are right about the problem of the static reference, and for some use-cases it is not suitable for sure. And as you have already started on this, I think it totally makes sense. I have been looking at the MapReduce classes (GoraInputFormat, GoraInputSplit, and others) these last days, and I totally understand what you are talking about. Maybe you'd like to open a JIRA for this, and if you could put a patch up I'd be happy to push it Damien. Now about the PartitionQueryImpl I saw you are also using a single partition as you are using MongoDB sharded, but how would you envision to use this if you were? I am asking this because I trying to fix the Hadoop support for Cassandra, but I haven't got a clear idea of this. Every data store is different and having a standard approach would probably help other modules to get this one right. Thanks Damien!
Renato M. 2013/8/11 Damien Raude-Morvan <[email protected]> > Hi folks, > > I think I might have found an issue in Gora IOUtils class. > > Right now, IOUtils keep a *static* reference to an SerializationFactory > which is initialized on first call to writeObject() with a Configuration > instance. Given Configuration is also stored in a static field of same > class for latter usage. > > But in fact each call to IOUtils.writeObject() can have a different > Configuration instance than previous one. In my personnal use case, I've > multiple M/R jobs which use Gora M/R feature to process Persistent object > but each job can work with a different datastore configuration (for > instance, name of table/collection/colum family). > > If we keep a static reference to SerializationFactory (and so its > Configuration reference), > QueryBase#readFields will then create a DataStore with wrong Configuration > (ie. using first DataStore/Configuration instead of new one) > > I've started working on this issue, and come up with a possible fix : > > https://github.com/drazzib/gora/compare/apache-gora-0.2.1...ioutils_static_conf > - remove static SerializationFactory from IOUtils (will recreate it every > time) > - in PartitionQueryImpl and QueryBase now send *current* configuration to > deserialize > One linked fix, is that gora "drivers" needs to be updated to define > Configuration instance in PartitionQueryImpl (like this > > https://github.com/drazzib/gora/commit/395f2e2ad50d524f42ecc563104c165fa0fa6f39 > ). > > What do you think about this issue ? > If you need it, I can produce a reduced test case to help you understanding > this > > Cheers, > -- > Damien >

