[ 
https://issues.apache.org/jira/browse/GORA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159497#comment-14159497
 ] 

Renato Javier MarroquĂ­n Mogrovejo commented on GORA-270:
--------------------------------------------------------

Hi guys,

I am sorry to say that this broke Giraph-Gora integration :(
For using Gora in a MR job, it needs io.serializations to be set with 
org.apache.hadoop.io.serializer.WritableSerialization, 
org.apache.hadoop.io.serializer.JavaSerialization, and it should be only done 
once (whether as part of the cluster configuration, or as part of the job). 
This means that for a Hadoop job all Gora-related data will be serialized in a 
specific manner.
By accepting this change, now we need to pass this Hadoop configuration with 
every single query object, which doesn't make sense as this is a Hadoop 
configuration and not a Gora configuration. This led to brake the integration 
with Giraph, right now a query object can't be generic, it has to pass the 
configuration even though this configuration has already been set for the whole 
job.
The configuration object does not contain anything related to Gora and I think 
that was the reason why it was static. I think we should revert this, create a 
test for showing that we need this, and put it back if needed.

> IOUtils static SerializationFactory field
> -----------------------------------------
>
>                 Key: GORA-270
>                 URL: https://issues.apache.org/jira/browse/GORA-270
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>            Reporter: Damien Raude-Morvan
>            Assignee: Damien Raude-Morvan
>              Labels: mapreduce
>             Fix For: 0.4
>
>         Attachments: 
> 0001-GORA-270-remove-static-reference-to-SerializationFac.patch
>
>
> (From 
> http://mail-archives.apache.org/mod_mbox/gora-dev/201308.mbox/%3CCAG50ZE_poN4C%2B%2B8t2xLZ3MoJVDMRo6nfW_Wygd%3D%3DeteF3jyLrw%40mail.gmail.com%3E)
> Right now, IOUtils keep a *static* reference to an SerializationFactory
> which is initialized on first call to writeObject() with a Configuration
> instance. Given Configuration is also stored in a static field of same
> class for latter usage.
> But in fact each call to IOUtils.writeObject() can have a different
> Configuration instance than previous one. In my personnal use case, I've
> multiple M/R jobs which use Gora M/R feature to process Persistent object
> but each job can work with a different datastore configuration (for
> instance, name of table/collection/colum family).
> If we keep a static reference to SerializationFactory (and so its
> Configuration reference),
> QueryBase#readFields will then create a DataStore with wrong Configuration
> (ie. using first DataStore/Configuration instead of new one)
> I've started working on this issue, and come up with a possible fix :
> https://github.com/drazzib/gora/compare/apache-gora-0.2.1...ioutils_static_conf
> - remove static SerializationFactory from IOUtils (will recreate it every
> time)
> - in PartitionQueryImpl and QueryBase now send *current* configuration to
> deserialize
> One linked fix, is that gora "drivers" needs to be updated to define
> Configuration instance in PartitionQueryImpl (like this
> https://github.com/drazzib/gora/commit/395f2e2ad50d524f42ecc563104c165fa0fa6f39
> ).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to