[ https://issues.apache.org/jira/browse/PIO-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079322#comment-16079322 ]
ASF GitHub Bot commented on PIO-102: ------------------------------------ GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/406 [PIO-102] Fix ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) Fix for [PIO-102](https://issues.apache.org/jira/browse/PIO-102) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio fix-es-getall-order Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #406 ---- commit 34fb0de8ae91f3bf9edb7a9823ea1784555845a8 Author: Mars Hall <m...@heroku.com> Date: 2017-07-08T01:22:14Z Append Elasticsearch scroll results to maintain order ---- > ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) > ------------------------------------------------------------------- > > Key: PIO-102 > URL: https://issues.apache.org/jira/browse/PIO-102 > Project: PredictionIO > Issue Type: Bug > Components: Core > Affects Versions: 0.11.0-incubating > Reporter: Mars Hall > Assignee: Mars Hall > > Using the new Elasticsearch 5.x REST storage client as the meta storage > source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in > conf/pio-env.sh), I found that once an engine has been trained a certain > number of times, that the most recent engine instance is no longer retrieved. > So, I tracked down where those Elasticsearch queries originate. > In the original Elasticsearch 1.x storage client, [the "scroll" pagination > responses are collected by > *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] > them to one another. > In the new Elasticsearch 5.x client, [the "scroll" responses are collected by > *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152] > them to one another. > This out-of-order concatenation breaks [ESEngineInstances > `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192] > by erroneously replacing the head of the results with an older engine > instance, when there are enough engine instances to overflow a single page of > Elasticsearch hits. > I've observed this buggy behavior after ten trainings, when enough engine > instances are stored to trigger Elasticsearch's scroll feature. > I'll be opening a pull request shortly with the super-simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)