Hi, First off, for Elasticsearch questions is worth pinging the Elastic mailing list as that is closer monitored than this one.
Back to your question, Jeetendra is right that the exception indicates nodata is flowing back to the es-connector and Spark. The default is 1m [1] which should be more than enough for a typical scenario. As a side note the scroll size is 50 per tasks (so 150 suggests 3 tasks). Once the query is made, scrolling the document is fast - likely there's something else at hand that causes the connection to timeout. In such cases, you can enable logging on the REST package and see what type of data transfer occurs between ES and Spark. Do note that if a GC occurs, that can freeze Elastic (or Spark) which might trigger the timeout. Consider monitoring Elasticsearch during the query and see whether anything jumps - in particular the memory pressure. Hope this helps, [1] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network On 4/22/15 10:44 PM, Adrian Mocanu wrote:
Hi Thanks for the help. My ES is up. Out of curiosity, do you know what the timeout value is? There are probably other things happening to cause the timeout; I don’t think my ES is that slow but it’s possible that ES is taking too long to find the data. What I see happening is that it uses scroll to get the data from ES; about 150 items at a time.Usual delay when I perform the same query from a browser plugin ranges from 1-5sec. Thanks *From:*Jeetendra Gangele [mailto:gangele...@gmail.com] *Sent:* April 22, 2015 3:09 PM *To:* Adrian Mocanu *Cc:* u...@spark.incubator.apache.org *Subject:* Re: ElasticSearch for Spark times out Basically ready timeout means hat no data arrived within the specified receive timeout period. Few thing I would suggest 1.are your ES cluster Up and running? 2. if 1 is yes then reduce the size of the Index make it few kbps and then test? On 23 April 2015 at 00:19, Adrian Mocanu <amoc...@verticalscope.com <mailto:amoc...@verticalscope.com>> wrote: Hi I use the ElasticSearch package for Spark and very often it times out reading data from ES into an RDD. How can I keep the connection alive (why doesn’t it? Bug?) Here’s the exception I get: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: java.net.SocketTimeoutException: Read timed out at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) ~[scala-library.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library.jar:na] at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library.jar:na] at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) ~[spark-core_2.10-1.1.0.jar:1.1.0] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) ~[spark-core_2.10-1.1.0.jar:1.1.0] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.1.0.jar:1.1.0] at org.apache.spark.scheduler.Task.run(Task.scala:54) ~[spark-core_2.10-1.1.0.jar:1.1.0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) ~[spark-core_2.10-1.1.0.jar:1.1.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75] Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_75] at java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_75] at java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_75] at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) ~[na:1.7.0_75] at java.io.BufferedInputStream.read(BufferedInputStream.java:334) ~[na:1.7.0_75] at org.apache.commons.httpclient.WireLogInputStream.read(WireLogInputStream.java:69) ~[commons-httpclient-3.1.jar:na] at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170) ~[commons-httpclient-3.1.jar:na] at java.io.FilterInputStream.read(FilterInputStream.java:133) ~[na:1.7.0_75] at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108) ~[commons-httpclient-3.1.jar:na] at org.elasticsearch.hadoop.rest.DelegatingInputStream.read(DelegatingInputStream.java:57) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.codehaus.jackson.impl.Utf8StreamParser.loadMore(Utf8StreamParser.java:172) ~[jackson-core-asl-1.9.11.jar:1.9.11] at org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1502) ~[jackson-core-asl-1.9.11.jar:1.9.11] at org.codehaus.jackson.impl.Utf8StreamParser.slowParseFieldName(Utf8StreamParser.java:1404) ~[jackson-core-asl-1.9.11.jar:1.9.11] at org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1231) ~[jackson-core-asl-1.9.11.jar:1.9.11] at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) ~[jackson-core-asl-1.9.11.jar:1.9.11] at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:84) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] ... 22 common frames omitted
-- Costin -- Costin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org