aruggero opened a new pull request, #1257:
URL: https://github.com/apache/solr/pull/1257

   https://issues.apache.org/jira/browse/SOLR-16596
   
   # Description
   In some scenarios, a null value for a feature has a different meaning than a 
zero value. There are models that are trained aware of this behavior (e.g. 
https://xgboost.readthedocs.io/en/stable/faq.html#how-to-deal-with-missing-values).
 This contribution wants to add the possibility to differentiate the 
_MultipleAdditiveTrees_ models' behavior when dealing with these two feature 
values. With the default configuration, a null and a zero value have the same 
meaning.
   
   # Solution
   An additional "_missing_" branch parameter has been introduced to 
differentiate the model behavior. This defines the branch to follow when the 
corresponding feature value is null.
   To manage null values, the "_myFeatures.json_" file needs to be modified. A 
"_defaultValue_" parameter with a "_NaN_" value needs to be added to each 
feature that can assume a null value.
   Also, the model configuration needs two additional parameters. 
"_isNullSameAsZero_" needs to be defined in the model "_params_" and set to 
"_false_"; then the "_missing_" parameter needs to be added to each branch 
where the corresponding feature supports null values. This can assume one value 
between "_left_" and "_right_".
   
   
_solr/modules/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java_
 has been modified. The _IsNullSameAsZero_ variable has been introduced to 
declare that we want to differentiate zeros from nulls. Then the _missing_ 
branch has been added to the tree to define the direction to take when dealing 
with null values.
   
   # Tests
   A new _multipleadditivetreesmodel_features_with_missing_branch.json_ file 
and two additional _MultipleAdditiveTreesModels_ files 
(_multipleadditivetreesmodel_with_missing_branch.json_ and 
_multipleadditivetreesmodel_with_missing_branch_for_interleaving.json_) have 
been added to test the new capability. 
   A new test has been added in 
_solr/modules/ltr/src/test/org/apache/solr/ltr/model/TestMultipleAdditiveTreesModel.java_
 to test the new behavior with null values:
   - _testMultipleAdditiveTreesWithNulls()_
   
   Additional tests have also been added to check the sparse/dense format 
behavior when dealing with null values. In 
_solr/modules/ltr/src/test/org/apache/solr/ltr/response/transform/TestFeatureLoggerTransformer.java_
 the new tests are:
   - _featureTransformer_shouldWorkInSparseFormat_withNulls()_
   - _featureTransformer_shouldWorkInDenseFormat_withNulls()_
   - _interleaving_featureTransformer_shouldWorkInSparseFormat_withNulls()_
   - _interleaving_featureTransformer_shouldWorkInDenseFormat_withNulls()_
   
   For those features with a default value of NaN, in the sparse format, we 
would like to see also zero values (since they are not the default ones).
   
   # Checklist
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   - [X] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to