Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

Neil Andrassy Fri, 30 Jan 2015 04:12:54 -0800

Hi Costin,

Thanks for your help. I saw your message but it's taken me a while to 
resolve and build with the nightly snapshot. I'm using SBT and I couldn't 
figure out how to add and resolve the dependency. I eventually pulled the 
file locally and dropped in my lib directory and now I can compile again 
using the elasticsearch-spark_2.10-2.1.0.BUILD-20150130.023537-206.jar. 
Guess, maybe that's for another post, but anyway...


The bad news is I still get more or less the same problem (but with Jackson 
in there - this is from the Map version but I still get a very similar 
exception from the json string version). Somewhere it just seems very 
unhappy about having \s in the parentId.

My good news is that I can get the case class variant using the PairRDD 
when I set both ID and PARENT in the Metadata to work 
(using saveToEsWithMeta). This gets me back on the road, but I think 
there's still an issue setting the parent from either JSON or Map using 
es.mapping.parent in master.

Thanks again,

Neil

Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
to s
tage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost 
task
 1.3 in stage 0.0 (TID 4, SERVER1): 
org.elasticsearch.hadoop.serialization.EsH
adoopSerializationException: org.codehaus.jackson.JsonParseException: 
Unrecogniz
ed character escape 's' (code 115)
 at [Source: [B@2a7b201e; line: 3, column: 17]
        
org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.text(Jacks
onJsonParser.java:153)
        
org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.
java:211)
        
org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.
java:150)
        
org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process
(JsonFieldExtractors.java:201)
        
org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess
(JsonTemplatedBulk.java:64)
        
org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(Template
dBulk.java:54)
        
org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository
.java:145)
        org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:47)
        
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)
        
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        java.lang.Thread.run(Unknown Source)
Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1174)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1173)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.
scala:59)
        at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala
:1173)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu
ler.scala:688)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$rec
eive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abst
ractDispatcher.scala:386)
        at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:19
79)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThre
ad.java:107)





On Thursday, 29 January 2015 23:00:57 UTC, Costin Leau wrote:
>
> Not sure if you've seen my previous message but please try out the master.
>
> On Fri, Jan 30, 2015 at 12:59 AM, Neil Andrassy <neil.a...@thefilter.com 
> <javascript:>> wrote:
>
>> I get the same problem with the json string approach too.
>>
>> On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:
>>>
>>> I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and 
>>> Spark 1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm 
>>> running on Windows).
>>>
>>> I'm working with a Map-based RDD rather than json.
>>>
>>> https://gist.github.com/andrassy/273179ed7cb01a38973d is a short 
>>> example that throws an exception.
>>>
>>> I'll also try the json approach and see if that works for me.
>>>
>>> Thanks,
>>>
>>> Neil
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/453bcc08-f308-4751-a463-fd77620a4e1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

Reply via email to