[ https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex DeCastro updated FLINK-5936: --------------------------------- Hi Till, by keyed vector I mean a vector that has some sort of unique identifier so that when a ML algo spits out predictions I can refer back to the original (read untransformed) data row. So for example, let’s say I’m clustering JIRA tickets: then I’d like to include the identifier FLINK-XXXX to my Breeze vectors so that they can be inspected after clustering: I’d like to see that, from a domain expert perspective, the clusters I get do refer to similar topics. In my case, I removed the project tag (ie FLINK-) from the unique identifier and augmented by numerical vectors to have one extra slot for a key. Then I modify my distance metric to ignore that extra coordinate. But it would be useful to have a variable in the Vector class that can be initialized to a unique identifier. Can you elaborate on the PredictDataSetOperation? I’m still new to Flink. Thanks, Alex On 3/3/17, 2:58 PM, "Till Rohrmann (JIRA)" <j...@apache.org> wrote: [ https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894510#comment-15894510 ] Till Rohrmann commented on FLINK-5936: -------------------------------------- Hi Alex, what do you mean by keyed vectors? Did you mean labeled vectors? That is indeed not supported yet. But you could add a respective {{PredictDataSetOperation}} for {{KNN}}. > Can't pass keyed vectors to KNN join algorithm > ------------------------------------------------ > > Key: FLINK-5936 > URL: https://issues.apache.org/jira/browse/FLINK-5936 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Affects Versions: 1.1.3 > Reporter: Alex DeCastro > Priority: Minor > > Hi there, > I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from the predict method of KNN join even if the Vector (FlinkVector) class gets extended to allow for keys. > If I create a class say, SparseVectorsWithKeys the predict method will return SparseVectors only. Any workarounds here? > Would it be possible to either extend the Vector class or the ML models to consume and output keyed vectors? This is very important to NLP and pretty much a lot of ML pipeline debugging -- including logging. > Thanks a lot > Alex -- This message was sent by Atlassian JIRA (v6.3.15#6346) This email and any attachments may contain information which is confidential and/or privileged. The information is intended exclusively for the addressee and the views expressed may not be official policy, but the personal views of the originator. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents is prohibited. If you have received this email and any file transmitted with it in error, please notify the sender by telephone or return email immediately and delete the material from your computer. Internet communications are not secure and Lab49 is not responsible for their abuse by third parties, nor for any alteration or corruption in transmission, nor for any damage or loss caused by any virus or other defect. Lab49 accepts no liability or responsibility arising out of or in any way connected to this email. > Can't pass keyed vectors to KNN join algorithm > ------------------------------------------------ > > Key: FLINK-5936 > URL: https://issues.apache.org/jira/browse/FLINK-5936 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Affects Versions: 1.1.3 > Reporter: Alex DeCastro > Priority: Minor > > Hi there, > I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from > the predict method of KNN join even if the Vector (FlinkVector) class gets > extended to allow for keys. > If I create a class say, SparseVectorsWithKeys the predict method will return > SparseVectors only. Any workarounds here? > Would it be possible to either extend the Vector class or the ML models to > consume and output keyed vectors? This is very important to NLP and pretty > much a lot of ML pipeline debugging -- including logging. > Thanks a lot > Alex -- This message was sent by Atlassian JIRA (v6.3.15#6346)