[ https://issues.apache.org/jira/browse/FLINK-30734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685034#comment-17685034 ]
Fan Hong commented on FLINK-30734: ---------------------------------- Sklearn has a discussion about this feature: [1] SparkML already supports this feature in a similar algorithm named QuantileDiscretizer: [2] [1][https://github.com/scikit-learn/scikit-learn/issues/9341] [2]https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.QuantileDiscretizer.html > KBinsDiscretizer handles Double.NaN incorrectly > ----------------------------------------------- > > Key: FLINK-30734 > URL: https://issues.apache.org/jira/browse/FLINK-30734 > Project: Flink > Issue Type: Bug > Components: Library / Machine Learning > Affects Versions: ml-2.1.0 > Reporter: Fan Hong > Priority: Major > > When the training data contains Double.NaN values and the strategy is set to > "quantile", the generated model data has Double.NaN as the right edge of the > largest bin. > My expected behavior is to ignore Double.NaN values when training, and to > support skip/error/keep strategy when transforming with generated > KBinsDiscretizerModel. -- This message was sent by Atlassian Jira (v8.20.10#820010)