[ https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277168#comment-14277168 ]
Renato Javier MarroquĂn Mogrovejo commented on GORA-401: -------------------------------------------------------- I think we are confusing things here guys. One thing is the move of _g_dirty into the base class. This was motivated because users where trying to use this field directly, but this is something that is part of Gora internals, so users shouldn't have access to it. As far as I am aware the serialization process implies that whatever was dirty in-memory (parts of the avro object) get flushed into the back-end. And after that, the object was clean again. So after deserializing, meaning that we just read it from the back-end, it should not be dirty. Isn't this the way we have been working on the different modules? > Serialization and deserialization of Persistent does not hold the entity > dirty state > ------------------------------------------------------------------------------------ > > Key: GORA-401 > URL: https://issues.apache.org/jira/browse/GORA-401 > Project: Apache Gora > Issue Type: Bug > Components: gora-core > Affects Versions: 0.4, 0.5 > Environment: Tested on gora-0.4, but seems logically to hold on > gora-0.5. HBase backend. > Reporter: Alfonso Nishikawa > Priority: Critical > Labels: serialization > Attachments: GORA-401-tests.patch > > Original Estimate: 35h > Time Spent: 4h > Remaining Estimate: 31h > > After removing __g__dirty field in GORA-326, dirty field is not serialized. > In GORA-321 > {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}} > went from using > {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}} > to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty > field to Avro (but really not desirable to have that field as a main field in > the entities). > The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which > will serialize the internal fields of the entities. > This bug affects, for example, Nutch, which loads only some fields in it's > phases, serializes entities (from Map to Reduce), and when deserializes finds > all fields as "dirty", independently of what fields were modified in the Map, > and overwrite all data in datastore (deleting much things: downloaded > content, parsed content, etc). > This effect can be seen in > {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in > {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections > shows that, entities are "equal" when it's fields are equal. This is fine as > "equal" definition, but another test must be added to check that > serialization an deserialization keeps the dirty state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)