Re: Trying to hash cross features with mllib

2021-10-04 Thread David Diebold
Hello Sean, Thank you for the heads-up ! Interaction transform won't help for my use case as it returns a vector that I won't be able to hash. I will definitely dig further into custom transformations though. Thanks ! David Le ven. 1 oct. 2021 à 15:49, Sean Owen a écrit : > Are you looking

Re: Trying to hash cross features with mllib

2021-10-01 Thread Sean Owen
Are you looking for https://spark.apache.org/docs/latest/ml-features.html#interaction ? That's the closest built in thing I can think of. Otherwise you can make custom transformations. On Fri, Oct 1, 2021, 8:44 AM David Diebold wrote: > Hello everyone, > > In MLLib, I’m trying to rely

Trying to hash cross features with mllib

2021-10-01 Thread David Diebold
Hello everyone, In MLLib, I’m trying to rely essentially on pipelines to create features out of the Titanic dataset, and show-case the power of feature hashing. I want to: - Apply bucketization on some columns (QuantileDiscretizer is fine) - Then I want to cross all my columns