[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2231:
----------------------------------------

    Attachment: 0001-Add-compositeType-and-DynamicCompositeType.patch

bq. (1) having validation only require that the components, if they're 
provided, are of the correct type (looking at Sylvain's code, the compare 
function should work, but the validation wouldn't)

That's a bug. The last test of validate() should be negated. The only goal was 
forbid having leading bytes after all the comparator have been used (because 
compare makes the assumption that there will not have bytes without a proper 
comparator for them).

However it is clear that we should allow a key having less component than 
comparator (btw, the unit test in my patch was actually doing it).

This is fixed in the patch I'm attaching.

bq. My other concern is that one of the things that got stripped out was the 
MATCH_MINIMUM, MATCH_MAXIMUM feature, which made implementing 
greater-than-equals, less-than-equals, etc. in ranges much easier

I'm not familiar enough with your code to be sure what those were doing 
exactly, but I suppose that the goal is that, if for instance you have the 
following column names:
{noformat}
    test:bar:42
    test:foo:24
    test:foo:42
    test:foobar
{noformat}
then you want to be able to query anything that starts with the two component 
{{test}} and {{foo}} (that is {{test:foo:24}} and {{test:foo:42}} in my 
example).

Greater-than is already doable in my previous patch (up to the bug in 
validation). For the less-than part, I agree that it is nice to be able to do 
it easily. In my new patch, I add a leading byte to each component, whose 
purpose is to always be 0, except for lesser-than query. That way, you can do 
the query above easily. The price is a slightly more complicated encoding but I 
think it's totally worth it.


bq. While i agree in theory, this is why thrift exists right?

I don't know thrift as well as you do, but I would think that changing the type 
of column name would be a massive compatibility breaker. And I really don't see 
a not crappy way to do it. Well, actually I do, but it consists in revamping 
the API completely, ditching the super columns in favor of composite column 
name. Don't get me wrong, I'm 100% in, but it's well beyond this particular 
ticket and I think that as long as we have a generic and simple enough encoding 
for those composite type, I believe it's ok to include this as is and just have 
a decent description of the encoding for client to use. We can still branch 
that later on on thrift anyway.

bq. (2) creating a DynamicType that could be used for dynamic types

I do agree with you that it could be useful and I understand the use case. I 
also agree that whatever we do, we should make it clear that it is not beginner 
stuff and that you shouldn't use a dynamic composite type unless you really 
need it (since the possibility of shooting yourself in the foot if you don't 
know what you do is fairly high).  All this being said, I took a stab at a 
DynamicCompositeType and I think the result really makes sense.

The main idea is really just to modify my static CompositeType so that each 
component starts with the name of the comparator class (that way, this is again 
completely generic, custom comparator can be used without problems).  There is 
then 2 optimisations to make it a reasonable solution:
  # FBUtilities.getComparator() uses a cache, to avoid the cost of 
introspection.
  # There is a notion of alias. They are user defined and are local to each 
DynamicCompositeType (cf. test/conf/cassandra.yaml in the patch). An alias is a 
simple (printable) ascii character (there is around 100 possible alias which 
should be aplenty). You can then use those alias in the encoded column name, in 
which case you only need 2 bytes to encode the comparator, making it fairly 
compact.
The encoded format is very close to the one of the static CompositeType, even 
though it is a tiny bit more complex due to the aliases handling. But given 
that dynamic composite type is not meant for the faint of heart, that should be 
fine.

I'm attaching a new patch (against 0.7 branch) with both the static and dynamic 
composite type. The bulk of the code is shared between them. This also include 
a fair amount of unit tests.


> Add CompositeType comparer to the comparers provided in 
> org.apache.cassandra.db.marshal
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib
>    Affects Versions: 0.7.3
>            Reporter: Ed Anuff
>            Priority: Minor
>         Attachments: 0001-Add-compositeType-and-DynamicCompositeType.patch, 
> 0001-Add-compositeType.patch, edanuff-CassandraCompositeType-1e253c4.zip
>
>
> CompositeType is a custom comparer that makes it possible to create 
> comparable composite values out of the basic types that Cassandra currently 
> supports, such as Long, UUID, etc.  This is very useful in both the creation 
> of custom inverted indexes using columns in a skinny row, where each column 
> name is a composite value, and also when using Cassandra's built-in secondary 
> index support, where it can be used to encode the values in the columns that 
> Cassandra indexes.  One scenario for the usage of these is documented here: 
> http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
> contribution is attached and has been previously maintained on github here: 
> https://github.com/edanuff/CassandraCompositeType

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to