Oh. Sorry :-) On Mon, Sep 15, 2014 at 3:27 AM, Mark Walkom <ma...@campaignmonitor.com> wrote:
> You probably want to put this in your own thread :) > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: ma...@campaignmonitor.com > web: www.campaignmonitor.com > > On 15 September 2014 06:55, SAURAV PAUL <saurav....@gmail.com> wrote: > >> Hi, >> >> I am trying to use Spark and ElasticSearch. >> >> Currently, the RDD contains pipe delimited records. >> >> parsedRDD.saveAsNewAPIHadoopFile(outputLocation, >> NullWritable.class, >> Text.class, >> CustomTextOutputFormat.class, >> job.getConfiguration()); >> >> Write now I am storing the output in HDFS. Instead now I want to create >> an index and store the output and want to use to kibana to do some >> analysis. >> >> What do I need to change so that I can push into ElasticSearch? Is it >> ESOutputFormat? >> >> >> >> On Monday, July 7, 2014 11:14:47 PM UTC+5:30, Costin Leau wrote: >>> >>> Thanks for the analysis. It looks like Hadoop 1.0.4 POM has an invalid >>> pom - though it uses Jackson 1.8.8 (see the distro) the pom declares >>> version 1.0.1 for some reason. Hadoop version 1.2 (the latest stable) and >>> higher has this fixed. >>> >>> We don't mark the jackson version within our POM since it's already >>> available at runtime - we can probably due so going forward in the Spark >>> integration. >>> >>> >>> On Mon, Jul 7, 2014 at 6:39 PM, Brian Thomas <brianjt...@gmail.com> >>> wrote: >>> >>>> Here is the gradle build I was using originally: >>>> >>>> apply plugin: 'java' >>>> apply plugin: 'eclipse' >>>> >>>> sourceCompatibility = 1.7 >>>> version = '0.0.1' >>>> group = 'com.spark.testing' >>>> >>>> repositories { >>>> mavenCentral() >>>> } >>>> >>>> dependencies { >>>> compile 'org.apache.spark:spark-core_2.10:1.0.0' >>>> compile 'edu.stanford.nlp:stanford-corenlp:3.3.1' >>>> compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: >>>> '3.3.1', classifier:'models' >>>> compile files('lib/elasticsearch-hadoop-2.0.0.jar') >>>> testCompile 'junit:junit:4.+' >>>> testCompile group: "com.github.tlrx", name: "elasticsearch-test", >>>> version: "1.2.1" >>>> } >>>> >>>> >>>> When I ran dependencyInsight on jackson, I got the following output: >>>> >>>> C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency >>>> jackson-core >>>> >>>> :dependencyInsight >>>> com.fasterxml.jackson.core:jackson-core:2.3.0 >>>> \--- com.fasterxml.jackson.core:jackson-databind:2.3.0 >>>> +--- org.json4s:json4s-jackson_2.10:3.2.6 >>>> | \--- org.apache.spark:spark-core_2.10:1.0.0 >>>> | \--- compile >>>> \--- com.codahale.metrics:metrics-json:3.0.0 >>>> \--- org.apache.spark:spark-core_2.10:1.0.0 (*) >>>> >>>> org.codehaus.jackson:jackson-core-asl:1.0.1 >>>> \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1 >>>> \--- org.apache.hadoop:hadoop-core:1.0.4 >>>> \--- org.apache.hadoop:hadoop-client:1.0.4 >>>> \--- org.apache.spark:spark-core_2.10:1.0.0 >>>> \--- compile >>>> >>>> Version 1.0.1 of jackson-core-asl does not have the field >>>> ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do. >>>> >>>> On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote: >>>> >>>>> Hi, >>>>> >>>>> Glad to see you sorted out the problem. Out of curiosity what version >>>>> of jackson were you using and what was pulling it in? Can you share you >>>>> maven pom/gradle build? >>>>> >>>>> >>>>> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas <brianjt...@gmail.com> >>>>> wrote: >>>>> >>>>>> I figured it out, dependency issue in my classpath. Maven was >>>>>> pulling down a very old version of the jackson jar. I added the >>>>>> following >>>>>> line to my dependencies and the error went away: >>>>>> >>>>>> compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13' >>>>>> >>>>>> >>>>>> On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote: >>>>>>> >>>>>>> I am trying to test querying elasticsearch using Apache Spark using >>>>>>> elasticsearch-hadoop. I am just trying to do a query to the >>>>>>> elasticsearch >>>>>>> server and return the count of results. >>>>>>> >>>>>>> Below is my test class using the Java API: >>>>>>> >>>>>>> import org.apache.hadoop.conf.Configuration; >>>>>>> import org.apache.hadoop.io.MapWritable; >>>>>>> import org.apache.hadoop.io.Text; >>>>>>> import org.apache.spark.SparkConf; >>>>>>> import org.apache.spark.api.java.JavaPairRDD; >>>>>>> import org.apache.spark.api.java.JavaSparkContext; >>>>>>> import org.apache.spark.serializer.KryoSerializer; >>>>>>> import org.elasticsearch.hadoop.mr.EsInputFormat; >>>>>>> >>>>>>> import scala.Tuple2; >>>>>>> >>>>>>> public class ElasticsearchSparkQuery{ >>>>>>> >>>>>>> public static int query(String masterUrl, String >>>>>>> elasticsearchHostPort) { >>>>>>> SparkConf sparkConfig = new SparkConf().setAppName("ESQuer >>>>>>> y").setMaster(masterUrl); >>>>>>> sparkConfig.set("spark.serializer", >>>>>>> KryoSerializer.class.getName()); >>>>>>> JavaSparkContext sparkContext = new >>>>>>> JavaSparkContext(sparkConfig); >>>>>>> >>>>>>> Configuration conf = new Configuration(); >>>>>>> conf.setBoolean("mapred.map.tasks.speculative.execution", >>>>>>> false); >>>>>>> conf.setBoolean("mapred.reduce.tasks.speculative.execution", >>>>>>> false); >>>>>>> conf.set("es.nodes", elasticsearchHostPort); >>>>>>> conf.set("es.resource", "media/docs"); >>>>>>> conf.set("es.query", "?q=*"); >>>>>>> >>>>>>> JavaPairRDD<Text, MapWritable> esRDD = >>>>>>> sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class, >>>>>>> MapWritable.class); >>>>>>> return (int) esRDD.count(); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> When I try to run this I get the following error: >>>>>>> >>>>>>> >>>>>>> 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0 >>>>>>> 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 >>>>>>> locally >>>>>>> 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: >>>>>>> ShardInputSplit [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll| >>>>>>> 10.45.71.152:9200],shard=0] >>>>>>> 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id... >>>>>>> 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0 >>>>>>> java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES >>>>>>> at org.elasticsearch.hadoop.serialization.json.JacksonJsonParse >>>>>>> r.<clinit>(JacksonJsonParser.java:38) >>>>>>> at org.elasticsearch.hadoop.serialization.ScrollReader.read(Scr >>>>>>> ollReader.java:75) >>>>>>> at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepo >>>>>>> sitory.java:267) >>>>>>> at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuer >>>>>>> y.java:75) >>>>>>> at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader. >>>>>>> next(EsInputFormat.java:319) >>>>>>> at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader. >>>>>>> nextKeyValue(EsInputFormat.java:255) >>>>>>> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopR >>>>>>> DD.scala:122) >>>>>>> at org.apache.spark.InterruptibleIterator.hasNext(Interruptible >>>>>>> Iterator.scala:39) >>>>>>> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:101 >>>>>>> 4) >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:84 >>>>>>> 7) >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:84 >>>>>>> 7) >>>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkC >>>>>>> ontext.scala:1080) >>>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkC >>>>>>> ontext.scala:1080) >>>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.sca >>>>>>> la:111) >>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51) >>>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.s >>>>>>> cala:187) >>>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>>>>> Executor.java:1145) >>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>>>>> lExecutor.java:615) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> >>>>>>> Has anyone run into this issue with the JacksonJsonParser? >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40goo >>>>>> glegroups.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/ecca33ea-b1e0-4196-84f0-c3c0838de786% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/ecca33ea-b1e0-4196-84f0-c3c0838de786%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/39d9fc65-7816-4904-8994-0b1ada7ce723%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/39d9fc65-7816-4904-8994-0b1ada7ce723%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/-9dALHJXuDw/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAEM624Yqq6bQYc7xrj%3DcDu%3DZ-o%3DbGjwYGQNM1RgihDVY2yc06g%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAEM624Yqq6bQYc7xrj%3DcDu%3DZ-o%3DbGjwYGQNM1RgihDVY2yc06g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Regards, Saurav Paul Bangalore -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAA%3D5a8Jget2z-qnWxdnbNbGbnf7mj3cbtzOwZyxDfdFZanGs4A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.