This should be fine.
There is no standard for dealing with values outside of the bin edges.
IIRC, TensorFlow returns NaN for these values (probably to catch data
drift) whereas SKlearn opens the end boundaries like you proposed.

Regards,
Arnab..

On Fri, Sep 22, 2023 at 3:05 PM Sebastian Baunsgaard
<[email protected]> wrote:

> Hi all,
>
> We support binning in transform encode and NaN if we try to encode
> values not in any Bin.
>
> e.g. if the bins are :
>
> 0-1,1-2,2-3,3-4,4-5,5-6,6-7,7-8,8-9,9-10
>
> If we try to encode 0.1 we return 1 for bin 1, or 8.8 we return 9.
>
> The issue I would like to address comes with values close to the
> boundaries where potential rounding errors in double values make us
> return Nan on inputs like 10.0000000001.
>
> I suggest that we open the end boundaries to encode any value that is
> above or below the end buckets like:
>
> -inf-1,1-2,2-3,3-4,4-5,5-6,6-7,7-8,8-9,9-inf
>
> This meanse that -13 is in bucket 1 and 1324 is in bucket 10, while we
> keep NaN mapping to NaN.
>
> Best regards
> Sebastian
>
>
>
>

Reply via email to