[ 
https://issues.apache.org/jira/browse/FLINK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735864#comment-15735864
 ] 

Chesnay Schepler commented on FLINK-5299:
-----------------------------------------

That's a good question. I've looked into it a bit more and think now that my 
original idea was kinda bad actually :/ (since i pretty much only accounted for 
the case of a single key which is some array)

What we could maybe do is the following (note that this is completely theory, i 
haven't tried out anything):

Add a new method to the TypeComparator that extracts a hash-stable Key:

{code}
public int extractHashStableKeys(Object record, Object[] target, int index) {
        return extractKeys(record, target, index); // to not break existing 
implementations
}
{code}

The TupleComparator implementation would look like this: (it is identical to 
extractKeys, except that it calls extractHashStableKeys)
{code}
@Override
public int extractHashStableKeys(Object record, Object[] target, int index) {
        int localIndex = index;
        for(int i = 0; i < comparators.length; i++) {
                localIndex += comparators[i].extractHashStableKeys(((Tuple) 
record).getField(keyPositions[i]), target, localIndex);
        }
        return localIndex - index;
}
{code}

Finally, we add the following method to the primitive array comparator:

{code}
@Override
public int extractHashStableKeys(Object record, Object[] target, int index) {
        target[index] = Arrays.hashCode(record);
        return 1;
}
{code}

There you go.

> DataStream support for arrays as keys
> -------------------------------------
>
>                 Key: FLINK-5299
>                 URL: https://issues.apache.org/jira/browse/FLINK-5299
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>    Affects Versions: 1.2.0
>            Reporter: Chesnay Schepler
>              Labels: star
>
> It is currently not possible to use an array as a key in the DataStream api, 
> as it relies on hashcodes which aren't stable for arrays.
> One way to implement this would be to check for the key type and inject a 
> KeySelector that calls "Arrays.hashcode(values)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to