[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mars Hall resolved PIO-106. --------------------------- Resolution: Fixed > Elasticsearch 5.x StorageClient should reuse RestClient > ------------------------------------------------------- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core > Affects Versions: 0.11.0-incubating > Reporter: Mars Hall > Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to sharing a > RestClient so that its keep-alive behavior may work as designed: > * maintain a singleton RestClient that is reused throughout the ES storage > classes > * create a RestClient on-demand and pass it as an argument to ES storage > methods > * other ideas? -- This message was sent by Atlassian JIRA (v6.4.14#64029)