Hi, Eric,

I am not trying to change the hash function. Your proposal is really very 
interesting any way. It will benefit in a lot of ways!! But the difficultly is 
as you describe, you have to find a hash function that can make result unique. 
Larger than 64 bit is one way, but I am doubting it just reduce the chance of 
collision but not prevent it. I am not good at math.. Maybe in case of hash 
collision, just append all the left parts as it is of today. Since collision is 
very rare, so this will dramatically reduce the size of rowkey generally, which 
will save in HFile and affect performance in various ways.
Maybe you can consider no need to change the hash, just solve the collision 
when it happens? This is a non-backward-compatible change, so rather to do it 
sooner than later, but I really feel very excited about this idea!

What I am trying to do is different, and just a try: I want to write a simple 
prototype of library that can provide a NoSQL style of access to Trafodion 
tables. Only for singleton IUD and get operation, maybe also good to provide 
range scan. Pure java code, so it can be used by Spark/MR or some kinds of 
application like storm. bypass the overhead of connectivity and SQL layer, so a 
bit better in performance. It is odd, while Trafodion is building SQL on top of 
Hadoop, so why bypass that and read Hbase directly? I don't have a good 
explain... But I feel it may have some use cases. One in my mind is to 
integrate with Spark where Trafodion as a datasource, today, the only path is 
via JDBCRdd, which I think we can provide yet another interface if possible 
will do no harm. I heard some traditional RDBMS provide NoSQL interface as 
well, not sure what is the use case. So just an initial research effort, to see 
feasibility. I want to do this for a long time.

To do that, I have to do all the encoding/decoding in that library, and have to 
calculate the right hbase rowkey. I cannot find a good way to invoke the 
current executor C++ code without SQL compiler involved, so seems a simpler way 
is to copy all the logic and rewrite in java. 

Thanks,
Ming

-----邮件原件-----
发件人: Eric Owhadi [mailto:[email protected]] 
发送时间: 2016年2月12日 23:29
收件人: [email protected]
主题: RE: how the SALT is caculated?

Hi Ming,
not sure what you are trying to implement but I am going to guess a use
case:

Sometime, the  primary key construct in trafodion is long, and contains strings 
with large max character.
Given that these keys end up exploded and padded with zero on the hbase key, an 
optimization could consist in putting a hash of these long strings instead of 
them, especially if we cannot benefit from keyed access.

So for this use, making the hash unique is key. I had experienced trying this 
idea with a 64 bit hash (using hash2partfunc twice to make a 64 bit), and 
loading a 170 000 000 table, and got duplicates (hash collision). So if your 
use case is around the same idea, please consider more than 64 bit hashing 
function. The hash code that is used for partitioning does not care about 
collision since it is just used for partitioning...

Not sure if this helps,
Regards,
Eric

-----Original Message-----
From: Liu, Ming (Ming) [mailto:[email protected]]
Sent: Friday, February 12, 2016 9:07 AM
To: [email protected]
Subject: 答复: how the SALT is caculated?

Thanks QiFan,

Following your hint, I found the ExHDPHash::eval() and corresponding hash() 
functions. Trying to understand them.

Thanks,
Ming

-----邮件原件-----
发件人: Qifan Chen [mailto:[email protected]]
发送时间: 2016年2月12日 21:32
收件人: dev <[email protected]>
主题: Re: how the SALT is caculated?

Hi Ming,

In trafodion,

"salt using 8 partitions on A" is equivalent to "hash2partfunc(a for 8)".

"salt using 16 partitions on (a,b)" is equivalent to "hash2partfunc(a,b for 
16)".


Thanks --Qifan



On Fri, Feb 12, 2016 at 6:15 AM, Liu, Ming (Ming) <[email protected]> wrote:

> Hi, all,
>
> I want to check the code that calculate the hash value for the _SALT_ 
> column in Trafodion. Could anyone point me to the exact source code, 
> which file and which function doing that?
> I tried for a while and cannot find it yet.
>
> So that I can write a function F, that F(all cluster key) => rowkey of 
> the Trafodion table row.
>
> Thanks,
> Ming
>
>


--
Regards, --Qifan

Reply via email to