[ 
https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex DeCastro updated FLINK-5936:
---------------------------------

Hi Till, by keyed vector I mean a vector that has some sort of unique 
identifier so that when a ML algo spits out predictions I can refer back to the 
original (read untransformed) data row.

So for example, let’s say I’m clustering JIRA tickets: then I’d like to include 
the identifier FLINK-XXXX to my Breeze vectors so that they can be inspected 
after clustering: I’d like to see that, from a domain expert perspective, the 
clusters I get do refer to similar topics.

In my case, I removed the project tag (ie FLINK-) from the unique identifier 
and augmented by numerical vectors to have one extra slot for a key. Then I 
modify my distance metric to ignore that extra coordinate.

But it would be useful to have a variable in the Vector class that can be 
initialized to a unique identifier.

Can you elaborate on the PredictDataSetOperation? I’m still new to Flink. 
Thanks, Alex

On 3/3/17, 2:58 PM, "Till Rohrmann (JIRA)" <j...@apache.org> wrote:


        [ 
https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894510#comment-15894510
 ]

    Till Rohrmann commented on FLINK-5936:
    --------------------------------------

    Hi Alex,

    what do you mean by keyed vectors?

    Did you mean labeled vectors? That is indeed not supported yet. But you 
could add a respective {{PredictDataSetOperation}} for {{KNN}}.

    > Can't pass keyed vectors to KNN join algorithm
    > ------------------------------------------------
    >
    >                 Key: FLINK-5936
    >                 URL: https://issues.apache.org/jira/browse/FLINK-5936
    >             Project: Flink
    >          Issue Type: Improvement
    >          Components: Machine Learning Library
    >    Affects Versions: 1.1.3
    >            Reporter: Alex DeCastro
    >            Priority: Minor
    >
    > Hi there,
    > I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys 
from the predict method of KNN join even if the Vector (FlinkVector) class gets 
extended to allow for keys.
    > If I create a class say, SparseVectorsWithKeys the predict method will 
return SparseVectors only. Any workarounds here?
    > Would it be possible to either extend the Vector class or the ML models 
to consume and output keyed vectors?  This is very important to NLP and pretty 
much a lot of ML pipeline debugging -- including logging.
    > Thanks a lot
    > Alex



    --
    This message was sent by Atlassian JIRA
    (v6.3.15#6346)


This email and any attachments may contain information which is confidential 
and/or privileged. The information is intended exclusively for the addressee 
and the views expressed may not be official policy, but the personal views of 
the originator. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents is prohibited. If you 
have received this email and any file transmitted with it in error, please 
notify the sender by telephone or return email immediately and delete the 
material from your computer. Internet communications are not secure and Lab49 
is not responsible for their abuse by third parties, nor for any alteration or 
corruption in transmission, nor for any damage or loss caused by any virus or 
other defect. Lab49 accepts no liability or responsibility arising out of or in 
any way connected to this email.


> Can't pass keyed vectors to KNN join algorithm  
> ------------------------------------------------
>
>                 Key: FLINK-5936
>                 URL: https://issues.apache.org/jira/browse/FLINK-5936
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>    Affects Versions: 1.1.3
>            Reporter: Alex DeCastro
>            Priority: Minor
>
> Hi there, 
> I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from 
> the predict method of KNN join even if the Vector (FlinkVector) class gets 
> extended to allow for keys.  
> If I create a class say, SparseVectorsWithKeys the predict method will return 
> SparseVectors only. Any workarounds here?  
> Would it be possible to either extend the Vector class or the ML models to 
> consume and output keyed vectors?  This is very important to NLP and pretty 
> much a lot of ML pipeline debugging -- including logging. 
> Thanks a lot
> Alex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to