Quick follow-up: this works sweetly with spark-1.1.1-bin-hadoop2.4.
On Dec 3, 2014, at 3:31 PM, Ian Wilkinson ia...@me.com wrote:
Hi,
I'm trying the Elasticsearch support for Spark (2.1.0.Beta3).
In the following I provide the query (as query dsl):
import org.elasticsearch.spark._
object TryES {
val sparkConf = new SparkConf().setAppName(Campaigns)
sparkConf.set(es.nodes, es_cluster:9200)
sparkConf.set(es.nodes.discovery, false)
val sc = new SparkContext(sparkConf)
def main(args: Array[String]) {
val query = {
query: {
...
}
}
val campaigns = sc.esRDD(resource, query)
campaigns.count();
}
}
However when I submit this (using spark-1.1.0-bin-hadoop2.4),
I am experiencing the following exceptions:
14/12/03 14:55:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool
14/12/03 14:55:27 INFO scheduler.DAGScheduler: Failed to run count at
TryES.scala:...
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure:
Lost task 1.0 in stage 0.0 (TID 1, localhost):
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream
for resource {
query: {
...
}
}
Is the query dsl supported with esRDD, or am I missing something
more fundamental?
Huge thanks,
ian
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org