[ 
https://issues.apache.org/jira/browse/ACCUMULO-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Kohlwey updated ACCUMULO-1551:
---------------------------------

    Description: 
I wanted to create a new ticket for my thoughts on this. I'd like to introduce 
a paradigm similar to the object inspectors used in HIVE to get data in and out 
of accumulo.

The base motivation for this is that the accumulo API is inconsistent. It is 
difficult to use for application developers and creates a lot of confusion to 
new developers because of the inconsistent use of Text, CharSequence, and 
byte[] for representing various parts of the keys. This is totally unnecessary 
and is in my mind a huge black eye.

Aside from providing a mechanism that could eventually be used to increase read 
performance in the client, this would also provide a simpler paradigm for 
application developers and would accomplish some aspects of ORM, a-la the Typo 
and Gora (although distinct from the goals and scope of Gora).

I've attached an initial pull request/code review outlining how I think the 
refactoring would work in scanner. Basically, the old API would be preserved by 
introducing generic supertypes, and a class that allows serialization directly 
from the ByteSequence objects.

While it may be true that some people have highly heterogenous data in their 
table, the worst case scenario here is that you just use the ByteSequences 
directly. This will, however, allow substantially simpler access even in that 
base case by making the access pattern consistent. In other cases, where a scan 
is only done over a particular column, or the data is very homogenous, the 
benefit is even greater.

https://github.com/ekohlwey/accumulo/compare/apache:trunk...ACCUMULO-1551

  was:
I wanted to create a new ticket for my thoughts on this. I'd like to introduce 
a paradigm similar to the object inspectors used in HIVE to get data in and out 
of accumulo.

The base motivation for this is that the accumulo API is inconsistent. It is 
difficult to use for application developers and creates a lot of confusion to 
new developers because of the inconsistent use of Text, CharSequence, and 
byte[] for representing various parts of the keys. This is totally unnecessary 
and is in my mind a huge black eye.

Aside from providing a mechanism that could eventually be used to increase read 
performance in the client, this would also provide a simpler paradigm for 
application developers and would accomplish some aspects of ORM, a-la the Typo 
and Gora (although distinct from the goals and scope of Gora).

I've attached an initial pull request/code review outlining how I think the 
refactoring would work in scanner. Basically, the old API would be preserved by 
introducing generic supertypes, and a class that allows serialization directly 
from the ByteSequence objects.

While it may be true that some people have highly heterogenous data in their 
table, the worst case scenario here is that you just use the ByteSequences 
directly. This will, however, allow substantially simpler access even in that 
base case by making the access pattern consistent. In other cases, where a scan 
is only done over a particular column, or the data is very heterogenous, the 
benefit is even greater.

https://github.com/ekohlwey/accumulo/compare/apache:trunk...ACCUMULO-1551

    
> Introduce Generic Supertypes to Replace Text
> --------------------------------------------
>
>                 Key: ACCUMULO-1551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1551
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Ed Kohlwey
>
> I wanted to create a new ticket for my thoughts on this. I'd like to 
> introduce a paradigm similar to the object inspectors used in HIVE to get 
> data in and out of accumulo.
> The base motivation for this is that the accumulo API is inconsistent. It is 
> difficult to use for application developers and creates a lot of confusion to 
> new developers because of the inconsistent use of Text, CharSequence, and 
> byte[] for representing various parts of the keys. This is totally 
> unnecessary and is in my mind a huge black eye.
> Aside from providing a mechanism that could eventually be used to increase 
> read performance in the client, this would also provide a simpler paradigm 
> for application developers and would accomplish some aspects of ORM, a-la the 
> Typo and Gora (although distinct from the goals and scope of Gora).
> I've attached an initial pull request/code review outlining how I think the 
> refactoring would work in scanner. Basically, the old API would be preserved 
> by introducing generic supertypes, and a class that allows serialization 
> directly from the ByteSequence objects.
> While it may be true that some people have highly heterogenous data in their 
> table, the worst case scenario here is that you just use the ByteSequences 
> directly. This will, however, allow substantially simpler access even in that 
> base case by making the access pattern consistent. In other cases, where a 
> scan is only done over a particular column, or the data is very homogenous, 
> the benefit is even greater.
> https://github.com/ekohlwey/accumulo/compare/apache:trunk...ACCUMULO-1551

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to