Hello Sean,

Thank you for the heads-up !
Interaction transform won't help for my use case as it returns a vector
that I won't be able to hash.
I will definitely dig further into custom transformations though.

Thanks !
David

Le ven. 1 oct. 2021 à 15:49, Sean Owen <sro...@gmail.com> a écrit :

> Are you looking for
> https://spark.apache.org/docs/latest/ml-features.html#interaction ?
> That's the closest built in thing I can think of.  Otherwise you can make
> custom transformations.
>
> On Fri, Oct 1, 2021, 8:44 AM David Diebold <davidjdieb...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> In MLLib, I’m trying to rely essentially on pipelines to create features
>> out of the Titanic dataset, and show-case the power of feature hashing. I
>> want to:
>>
>> -          Apply bucketization on some columns (QuantileDiscretizer is
>> fine)
>>
>> -          Then I want to cross all my columns with each other to have
>> cross features.
>>
>> -          Then I would like to hash all of these cross features into a
>> vector.
>>
>> -          Then give it to a logistic regression.
>>
>> Looking at the documentation, it looks like the only way to hash features
>> is the *FeatureHasher* transformation. It takes multiple columns as
>> input, type can be numeric, bool, string (but no vector/array).
>>
>> But now I’m left wondering how I can create my cross-feature columns. I’m
>> looking at a transformation that could take two columns as input, and
>> return a numeric, bool, or string. I didn't manage to find anything that
>> does the job. There are multiple transformations such as VectorAssembler,
>> that operate on vector, but this is not a typeaccepted by the FeatureHasher.
>>
>> Of course, I could try to combine columns directly in my dataframe
>> (before the pipeline kicks-in), but then I would not be able to benefit any
>> more from QuantileDiscretizer and other cool functions.
>>
>>
>> Am I missing something in the transformation api ? Or is my approach to
>> hashing wrong ? Or should we consider to extend the api somehow ?
>>
>>
>>
>> Thank you, kind regards,
>>
>> David
>>
>

Reply via email to