Lewis, thanks for your comments. My GORA-142 patch uses Hector's serializers and several new serializers on top of Hector's one. It should be upper compatible with gora-cassandra 0.2 and work with Nutch 2.0. Currently, it supports:
single column: * primitive type (fixed length) : boolean, int, long, float, double * primitive type (variable length) : bytes, string * complex type (fixed length) : fixed * array of the above types ** fixed length element - (element0) (element1) (element2) ... ** variable length element - (int:length of element 0) (element0) (int:length of element 1) (element1) ... super column family * record which field type is supported with single column including array ** column name is field name (string) * map which value type is supported with single column including array ** column name is key (string) * array which element type is supported with single column including array ** column name is index of array (int) Since I wrote the patch more than a month ago, I will test it again before committing to trunk later this week. Regards, -Kaz On 7/10/12 2:04 PM, Lewis John Mcgibbney wrote: > Hi, > As I was going to spin an RC for 0.3 this weekend I thought I'd just > give my brief input on this one before the weekend is upon us... > > On Mon, Jul 9, 2012 at 5:50 PM, Kazuomi Kashii <[email protected]> wrote: > >> Regarding Gora Hadoop job related parts in your comment, I have not >> checked the detail yet, >> but it seems that Nutch 2.0 is working with the current stable version >> of gora-cassandra 0.2 with Hector's serializers. > +1 > >> Also, I cannot find any code related to Hadoop in gora-cassandra, so I >> thought that it should be handled in gora-core. >> I will be checking that part, but if I am totally wrong, please correct me. > I wonder if you managed to get any definitive outcome for this Kaz? > >> In summary, >> * gora-cassandra 0.2 uses Hector's serialization; and >> * gora-cassandra 0.3 is not compatible, if Avro's serialization is >> introduced in 0.3. > IMHO I think it is really important that we retain flawless backwards > compatibility. Generally speaking there has been a pretty large effort > to firstly get Gora more stabilized before opting to do the same for > Nutch 2.X, we are now in a reasonable position to continue building > out and unless it was absolutely necessary then I would vote against > changing to Avro for serialization within gora-cassandra. The decision > was made by Alexis to support Hector as primary Gora client some time > ago, personally i would like to see this committment shadowed > throughout the entire Hector API usage within the module unless of > course the Avro serializers are superior... as no clear argument has > been presented I'm -1 against implementing Avro stuff over Hector > stuff within the module. Finally although the Avro usage within > gora-core and elsewhere is somewhat dated, we are aware of this and > AFAIK there is work in motion to get it updated, however I don't know > how this is progressing. To change the serialization spec in > gora-cassandra for it then to be possibly overhauled would not make > sense to me either. > >> so my recommendation is to keep using Hector's serializers in >> gora-cassandra 0.3 and later. > I concur +1. This basically reflects the initial reason for the > DISCUSS thread on the 0.3 release. I think the work which has gone > into especially gora-cassandra (having been slightly neglected in the > past) justifies an interim release. > > Thanks guys and in all honesty it would be excellent to kick off more > conversation on this one before the weekend comes around. > > Lewis

