Re: [DISCUSS] gora-cassandra serialization spec

Kazuomi Kashii Wed, 11 Jul 2012 01:10:43 -0700

Lewis, thanks for your comments.

My GORA-142 patch uses Hector's serializers and several new serializers
on top of Hector's one.
It should be upper compatible with gora-cassandra 0.2 and work with
Nutch 2.0.
Currently, it supports:


single column:
* primitive type (fixed length) : boolean, int, long, float, double
* primitive type (variable length) : bytes, string
* complex type (fixed length) : fixed
* array of the above types
** fixed length element - (element0) (element1) (element2) ...
** variable length element - (int:length of element 0) (element0)
(int:length of element 1)  (element1) ...

super column family
* record which field type is supported with single column including array
** column name is field name (string)
* map which value type is supported with single column including array
** column name is key (string)
* array which element type is supported with single column including array
** column name is index of array (int)

Since I wrote the patch more than a month ago,
I will test it again before committing to trunk later this week.

Regards,
-Kaz


On 7/10/12 2:04 PM, Lewis John Mcgibbney wrote:
> Hi,
> As I was going to spin an RC for 0.3 this weekend I thought I'd just
> give my brief input on this one before the weekend is upon us...
>
> On Mon, Jul 9, 2012 at 5:50 PM, Kazuomi Kashii <[email protected]> wrote:
>
>> Regarding Gora Hadoop job related parts in your comment, I have not
>> checked the detail  yet,
>> but it seems that Nutch 2.0 is working with the current stable version
>> of gora-cassandra 0.2 with Hector's serializers.
> +1
>
>> Also, I cannot find any code related to Hadoop in gora-cassandra, so I
>> thought that it should be handled in gora-core.
>> I will be checking that part, but if I am totally wrong, please correct me.
> I wonder if you managed to get any definitive outcome for this Kaz?
>
>> In summary,
>> * gora-cassandra 0.2 uses Hector's serialization; and
>> * gora-cassandra 0.3 is not compatible, if  Avro's serialization is
>> introduced in 0.3.
> IMHO I think it is really important that we retain flawless backwards
> compatibility. Generally speaking there has been a pretty large effort
> to firstly get Gora more stabilized before opting to do the same for
> Nutch 2.X, we are now in a reasonable position to continue building
> out and unless it was absolutely necessary then I would vote against
> changing to Avro for serialization within gora-cassandra. The decision
> was made by Alexis to support Hector as primary Gora client some time
> ago, personally i would like to see this committment shadowed
> throughout the entire Hector API usage within the module unless of
> course the Avro serializers are superior... as no clear argument has
> been presented I'm -1 against implementing Avro stuff over Hector
> stuff within the module. Finally although the Avro usage within
> gora-core and elsewhere is somewhat dated, we are aware of this and
> AFAIK there is work in motion to get it updated, however I don't know
> how this is progressing. To change the serialization spec in
> gora-cassandra for it then to be possibly overhauled would not make
> sense to me either.
>
>> so my recommendation is to keep using Hector's serializers in
>> gora-cassandra 0.3 and later.
> I concur +1. This basically reflects the initial reason for the
> DISCUSS thread on the 0.3 release. I think the work which has gone
> into especially gora-cassandra (having been slightly neglected in the
> past) justifies an interim release.
>
> Thanks guys and in all honesty it would be excellent to kick off more
> conversation on this one before the weekend comes around.
>
> Lewis

Re: [DISCUSS] gora-cassandra serialization spec

Reply via email to