Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 > This is very common in academic research and literature, but it may not be in industry. I'm fine with not considering it for now. Ok makes sense - for the `transform` case if users are looking to directly use the hash sigs as lower-dim representation, they can always set `L=1` and `d` (assuming we do AND + OR later) to get just one "vector" output. For the public vals - sorry if I wan't clear. I meant we should probably not expose them until the API is fully baked. But yes I see that they are useful to expose once we're happy with the API. I just don't love the idea of changing things later (and throwing errors and whatnot) if we can avoid it - I think we saw similar issues with e.g. NaiveBayes now. > What about outputting a Matrix instead of an Array of Vectors? That will make it easy to change in the future, without us having weird Vectors of length 1. Matrix can work - I don't think `Array[Vector]` is an issue either. I seem to recall a comment above that Matrix was a bit less easy to work with (exploding indices and so on). I don't see a big difference between an Lx1 matrix and an L-length Array of 1-d vectors in practical terms. So, I'm ok with either approach. I'll check the JIRA - sorry I missed the links.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org