[ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267565#comment-14267565
 ] 

Lewis John McGibbney commented on GORA-401:
-------------------------------------------

Hi Alfonso et al. I've read through this and would like to spend some time 
today acrtually absorbing the points made here. It should be mentioned at this 
stage that the work undertaken to GORA-94 is no less trivial now than it was 
then. Do we have regression issues? Yes. What are they? Here are a few...
 * The GoraCompiler has changed entirely in terms of functionality
 * The GoraCompiler has changed entirely in terms of the way it is physically 
invoked. This is something which we can actually remedy through some carefully 
crafted ports of functionality from the old GoraCompiler to the new one. I 
intended to more thoroughly document some of these in GORA-324 once I wake back 
up.
 * Alfonso has indicated how StateManager was effectively not only deprecated 
(as there was no real way to do this without having a horribly convoluted 
codebase) but deleted entirely from the code base. In all honesty I rember 
talking about this at length here within the communty and with 
[~ap.giannakidis] in Dublin at the NoSQL meetup when I gave a 
[presentation|http://prezi.com/b5_vabnmelmy/?utm_campaign=share&utm_medium=copy&rc=ex0share]
 on what was proposed and ultimately changing.
 * there are more here folks however lingering on them without addressing them 
directly is a fruitless effort. I would rather work towards a solution.

I would suggest one thing right now which is that if [~alfonso.nishikawa], you 
wish to reintroduce the PersistentDatumWriter/Reader then by all means please 
go ahead. There is absolutely nothing stopping you. 
I also want to state that the logic, reasoning and justification (all possibly 
bundled into one primary driving force) for a move towards upgrading Avro in 
Gora in the manner we did was that Avro had changed SO much between 1.3.3 --> 
1.7.X with so many improvements that any issues were we having with regards to 
serialization were not really compatible/comparable with what was being 
experienced within the Avro community. It is safe to say that when Gora 
initially entered incubation at the ASF, Avro was in its infancy. The library 
has moved on and I think we need to ensure that Gora does the same.
Finally, this is *exactly* is why I was (and still am) *extremely* keen to get 
moving with GoraCI under the RackSpace hosting we have available. I am of the 
opinion that we need to be putting the Gora serialization code under much more 
scrutiny. This way we can hopefully reach consensus on what we need ti 
implement based on facts in addition to opinion (I apologize if this sounds a 
bit paradoxical).

I'll make an effort to look into all of the issues you've raised 
[~alfonso.nishikawa], thank you for voicing them.

> Serialization and deserialization of Persistent does not hold the entity 
> dirty state
> ------------------------------------------------------------------------------------
>
>                 Key: GORA-401
>                 URL: https://issues.apache.org/jira/browse/GORA-401
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>    Affects Versions: 0.4, 0.5
>         Environment: Tested on gora-0.4, but seems logically to hold on 
> gora-0.5
>            Reporter: Alfonso Nishikawa
>            Priority: Critical
>              Labels: serialization
>   Original Estimate: 35h
>  Remaining Estimate: 35h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. 
> In GORA-321 
> {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
>  went from using 
> {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
>  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
> field to Avro (but really not desirable to have that field as a main field in 
> the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
> will serialize the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's 
> phases, serializes entities (from Map to Reduce), and when deserializes finds 
> all fields as "dirty", independently of what fields were modified in the Map, 
> and overwrite all data in datastore (deleting much things: downloaded 
> content, parsed content, etc).
> This effect can be seen in 
> {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
> {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
> shows that, entities are "equal" when it's fields are equal. This is fine as 
> "equal" definition, but another test must be added to check that 
> serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to