Is your ES cluster reachable from your Spark cluster via network / firewall? Can you run the same query from the spark master and slave nodes via curl / one of the other clients?
Seems odd that GC issues would be a problem from the scan but not when running query from a browser plugin... Sounds like it could be a network issue. — Sent from Mailbox On Thu, Apr 23, 2015 at 5:11 AM, Otis Gospodnetic <otis.gospodne...@gmail.com> wrote: > Hi, > If you get ES response back in 1-5 seconds that's pretty slow. Are these > ES aggregation queries? Costin may be right about GC possibly causing > timeouts. SPM <http://sematext.com/spm/> can give you all Spark and all > key Elasticsearch metrics, including various JVM metrics. If the problem > is GC, you'll see it. If you monitor both Spark side and ES side, you > should be able to find some correlation with SPM. > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > On Wed, Apr 22, 2015 at 5:43 PM, Costin Leau <costin.l...@gmail.com> wrote: >> Hi, >> >> First off, for Elasticsearch questions is worth pinging the Elastic >> mailing list as that is closer monitored than this one. >> >> Back to your question, Jeetendra is right that the exception indicates >> nodata is flowing back to the es-connector and >> Spark. >> The default is 1m [1] which should be more than enough for a typical >> scenario. As a side note the scroll size is 50 per >> tasks >> (so 150 suggests 3 tasks). >> >> Once the query is made, scrolling the document is fast - likely there's >> something else at hand that causes the >> connection to timeout. >> In such cases, you can enable logging on the REST package and see what >> type of data transfer occurs between ES and Spark. >> >> Do note that if a GC occurs, that can freeze Elastic (or Spark) which >> might trigger the timeout. Consider monitoring >> Elasticsearch during >> the query and see whether anything jumps - in particular the memory >> pressure. >> >> Hope this helps, >> >> [1] >> http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network >> >> On 4/22/15 10:44 PM, Adrian Mocanu wrote: >> >>> Hi >>> >>> Thanks for the help. My ES is up. >>> >>> Out of curiosity, do you know what the timeout value is? There are >>> probably other things happening to cause the timeout; >>> I don’t think my ES is that slow but it’s possible that ES is taking too >>> long to find the data. What I see happening is >>> that it uses scroll to get the data from ES; about 150 items at a >>> time.Usual delay when I perform the same query from a >>> browser plugin ranges from 1-5sec. >>> >>> Thanks >>> >>> *From:*Jeetendra Gangele [mailto:gangele...@gmail.com] >>> *Sent:* April 22, 2015 3:09 PM >>> *To:* Adrian Mocanu >>> *Cc:* u...@spark.incubator.apache.org >>> *Subject:* Re: ElasticSearch for Spark times out >>> >>> Basically ready timeout means hat no data arrived within the specified >>> receive timeout period. >>> >>> Few thing I would suggest >>> >>> 1.are your ES cluster Up and running? >>> >>> 2. if 1 is yes then reduce the size of the Index make it few kbps and >>> then test? >>> >>> On 23 April 2015 at 00:19, Adrian Mocanu <amoc...@verticalscope.com >>> <mailto:amoc...@verticalscope.com>> wrote: >>> >>> Hi >>> >>> I use the ElasticSearch package for Spark and very often it times out >>> reading data from ES into an RDD. >>> >>> How can I keep the connection alive (why doesn’t it? Bug?) >>> >>> Here’s the exception I get: >>> >>> org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: >>> java.net.SocketTimeoutException: Read timed out >>> >>> at >>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> ~[scala-library.jar:na] >>> >>> at >>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>> ~[scala-library.jar:na] >>> >>> at >>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) >>> ~[scala-library.jar:na] >>> >>> at >>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> ~[scala-library.jar:na] >>> >>> at >>> scala.collection.Iterator$class.foreach(Iterator.scala:727) >>> ~[scala-library.jar:na] >>> >>> at >>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>> ~[scala-library.jar:na] >>> >>> at >>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) >>> ~[spark-core_2.10-1.1.0.jar:1.1.0] >>> >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> ~[spark-core_2.10-1.1.0.jar:1.1.0] >>> >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> ~[spark-core_2.10-1.1.0.jar:1.1.0] >>> >>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>> ~[spark-core_2.10-1.1.0.jar:1.1.0] >>> >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >>> ~[spark-core_2.10-1.1.0.jar:1.1.0] >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> [na:1.7.0_75] >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> [na:1.7.0_75] >>> >>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75] >>> >>> Caused by: java.net.SocketTimeoutException: Read timed out >>> >>> at java.net.SocketInputStream.socketRead0(Native Method) >>> ~[na:1.7.0_75] >>> >>> at >>> java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_75] >>> >>> at >>> java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_75] >>> >>> at >>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273) >>> ~[na:1.7.0_75] >>> >>> at >>> java.io.BufferedInputStream.read(BufferedInputStream.java:334) >>> ~[na:1.7.0_75] >>> >>> at >>> org.apache.commons.httpclient.WireLogInputStream.read(WireLogInputStream.java:69) >>> ~[commons-httpclient-3.1.jar:na] >>> >>> at >>> org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170) >>> ~[commons-httpclient-3.1.jar:na] >>> >>> at >>> java.io.FilterInputStream.read(FilterInputStream.java:133) ~[na:1.7.0_75] >>> >>> at >>> org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108) >>> ~[commons-httpclient-3.1.jar:na] >>> >>> at >>> org.elasticsearch.hadoop.rest.DelegatingInputStream.read(DelegatingInputStream.java:57) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> at >>> org.codehaus.jackson.impl.Utf8StreamParser.loadMore(Utf8StreamParser.java:172) >>> ~[jackson-core-asl-1.9.11.jar:1.9.11] >>> >>> at >>> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1502) >>> ~[jackson-core-asl-1.9.11.jar:1.9.11] >>> >>> at >>> org.codehaus.jackson.impl.Utf8StreamParser.slowParseFieldName(Utf8StreamParser.java:1404) >>> ~[jackson-core-asl-1.9.11.jar:1.9.11] >>> >>> at >>> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1231) >>> ~[jackson-core-asl-1.9.11.jar:1.9.11] >>> >>> at >>> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) >>> ~[jackson-core-asl-1.9.11.jar:1.9.11] >>> >>> at >>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:84) >>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] >>> >>> ... 22 common frames omitted >>> >>> >>> >>> >> -- >> Costin >> >> >> >> >> -- >> Costin >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>