(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-28 Thread neha nihal
Thanks. Its working now. My test data had some labels which were not there
in training set.

On Wednesday, June 28, 2017, Pralabh Kumar > wrote:

> Hi Neha
>
> This generally occurred when , you training data set have some value of
> categorical variable ,which in not there in your testing data. For e.g you
> have column DAYS ,with value M,T,W in training data . But when your test
> data contains F ,then it say no key found exception .  Please look into
> this  , and if that's not the case ,then Could you please share your code
> ,and training/testing data for better understanding.
>
> Regards
> Pralabh Kumar
>
> On Wed, Jun 28, 2017 at 11:45 AM, neha nihal 
> wrote:
>
>>
>> Hi,
>>
>> I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
>> classification. TF-IDF feature extractor is also used. The training part
>> runs without any issues and returns 100% accuracy. But when I am trying to
>> do prediction using trained model and compute test error, it fails with
>> java.util.NosuchElementException: key not found exception.
>> Any help will be much appreciated.
>>
>> Thanks
>>
>>
>


Re: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-28 Thread Pralabh Kumar
Hi Neha

This generally occurred when , you training data set have some value of
categorical variable ,which in not there in your testing data. For e.g you
have column DAYS ,with value M,T,W in training data . But when your test
data contains F ,then it say no key found exception .  Please look into
this  , and if that's not the case ,then Could you please share your code
,and training/testing data for better understanding.

Regards
Pralabh Kumar

On Wed, Jun 28, 2017 at 11:45 AM, neha nihal  wrote:

>
> Hi,
>
> I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
> classification. TF-IDF feature extractor is also used. The training part
> runs without any issues and returns 100% accuracy. But when I am trying to
> do prediction using trained model and compute test error, it fails with
> java.util.NosuchElementException: key not found exception.
> Any help will be much appreciated.
>
> Thanks
>
>


Fwd: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-27 Thread neha nihal
Hi,

I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
classification. TF-IDF feature extractor is also used. The training part
runs without any issues and returns 100% accuracy. But when I am trying to
do prediction using trained model and compute test error, it fails with
java.util.NosuchElementException: key not found exception.
Any help will be much appreciated.

Thanks


(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-27 Thread neha nihal
Hi,

I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
classification. TF-IDF feature extractor is also used. The training part
runs without any issues and returns 100% accuracy. But when I am trying to
do prediction using trained model and compute test error, it fails with
java.util.NosuchElementException: key not found exception.
Any help will be much appreciated.

Thanks & Regards


Re: java.util.NoSuchElementException: key not found error

2015-10-21 Thread Josh Rosen
This is https://issues.apache.org/jira/browse/SPARK-10422, which has been
fixed in Spark 1.5.1.

On Wed, Oct 21, 2015 at 4:40 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> In 1.5.0 if I use randomSplit on a data frame I get this error.
>
> Here is teh code snippet -
>
> val splitData = merged.randomSplit(Array(70,30))
> val trainData = splitData(0).persist()
> val testData = splitData(1)
>
> trainData.registerTempTable("trn")
>
> %sql select * from trn
>
> The exception goes like this -
>
> java.util.NoSuchElementException: key not found: 1910 at
> scala.collection.MapLike$class.default(MapLike.scala:228) at
> scala.collection.AbstractMap.default(Map.scala:58) at
> scala.collection.mutable.HashMap.apply(HashMap.scala:64) at
> org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
> at
> org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
> at
> org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
> at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
> at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
> org.apache.spark.scheduler.Task.run(Task.scala:88) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Any idea ?
>
> regards,
> Sourav
>


java.util.NoSuchElementException: key not found error

2015-10-21 Thread Sourav Mazumder
In 1.5.0 if I use randomSplit on a data frame I get this error.

Here is teh code snippet -

val splitData = merged.randomSplit(Array(70,30))
val trainData = splitData(0).persist()
val testData = splitData(1)

trainData.registerTempTable("trn")

%sql select * from trn

The exception goes like this -

java.util.NoSuchElementException: key not found: 1910 at
scala.collection.MapLike$class.default(MapLike.scala:228) at
scala.collection.AbstractMap.default(Map.scala:58) at
scala.collection.mutable.HashMap.apply(HashMap.scala:64) at
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
at
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:88) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Any idea ?

regards,
Sourav


Re: calling persist would cause java.util.NoSuchElementException: key not found:

2015-10-01 Thread Shixiong Zhu
Do you have the full stack trace? Could you check if it's same as
https://issues.apache.org/jira/browse/SPARK-10422

Best Regards,
Shixiong Zhu

2015-10-01 17:05 GMT+08:00 Eyad Sibai :

> Hi
>
> I am trying to call .persist() on a dataframe but once I execute the next
> line I am getting
> java.util.NoSuchElementException: key not found: ….
>
> I tried to do persist on disk also the same thing.
>
> I am using:
> pyspark with python3
> spark 1.5
>
>
> Thanks!
>
>
> EYAD SIBAI
> Risk Engineer
>
> *iZettle ®*
> ––
>
> Mobile: +46 72 911 60 54 <+46%2072%20911%2060%2054>
> Web: www.izettle.com <http://izettle.com/>
>


calling persist would cause java.util.NoSuchElementException: key not found:

2015-10-01 Thread Eyad Sibai
Hi

I am trying to call .persist() on a dataframe but once I execute the next line 
I am getting
java.util.NoSuchElementException: key not found: ….


I tried to do persist on disk also the same thing.


I am using:
pyspark with python3
spark 1.5




Thanks!



EYAD SIBAI
Risk Engineer

iZettle ®
––


Mobile: +46 72 911 60 54
Web: www.izettle.com

Re: Re: java.util.NoSuchElementException: key not found

2015-09-13 Thread guoqing0...@yahoo.com.hk
Thank u very much !  when will the Spark 1.5.1 come out. 



guoqing0...@yahoo.com.hk
 
From: Yin Huai
Date: 2015-09-12 04:49
To: guoqing0...@yahoo.com.hk
CC: user
Subject: Re: java.util.NoSuchElementException: key not found
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it has 
been fixed in branch 1.5. 1.5.1 release will have it.

On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk 
 wrote:
Hi all , 
After upgrade spark to 1.5 ,  Streaming throw java.util.NoSuchElementException: 
key not found occasionally , is the problem of data cause this error ?  please 
help me if anyone got similar problem before , Thanks very much.

the exception accur when write into database.


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 
76, slave2): java.util.NoSuchElementException: key not found: 
ruixue.sys.session.request
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at 
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
at 
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
at 
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
at 
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)



guoqing0...@yahoo.com.hk



Re: java.util.NoSuchElementException: key not found

2015-09-11 Thread Yin Huai
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it
has been fixed in branch 1.5. 1.5.1 release will have it.

On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk <
guoqing0...@yahoo.com.hk> wrote:

> Hi all ,
> After upgrade spark to 1.5 ,  Streaming throw
> java.util.NoSuchElementException: key not found occasionally , is the
> problem of data cause this error ?  please help me if anyone got similar
> problem before , Thanks very much.
>
> the exception accur when write into database.
>
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 
> (TID 76, slave2): java.util.NoSuchElementException: key not found: 
> ruixue.sys.session.request
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>
> at 
> org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
>
> at 
> org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
>
> at 
> org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
>
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
>
> at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
>
> at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
>
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
>
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
> --
> guoqing0...@yahoo.com.hk
>


java.util.NoSuchElementException: key not found

2015-09-11 Thread guoqing0...@yahoo.com.hk
Hi all , 
After upgrade spark to 1.5 ,  Streaming throw java.util.NoSuchElementException: 
key not found occasionally , is the problem of data cause this error ?  please 
help me if anyone got similar problem before , Thanks very much.

the exception accur when write into database.


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 
76, slave2): java.util.NoSuchElementException: key not found: 
ruixue.sys.session.request
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at 
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
at 
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
at 
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
at 
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)



guoqing0...@yahoo.com.hk


RE: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread java8964
You can use the HiveContext instead of SQLContext, which should support all the 
HiveQL, including lateral view explode.
SQLContext is not supporting that yet.
BTW, nice coding format in the email.
Yong

Date: Tue, 31 Mar 2015 18:18:19 -0400
Subject: Re: SparkSql - java.util.NoSuchElementException: key not found: node 
when access JSON Array
From: tsind...@gmail.com
To: user@spark.apache.org

So in looking at this a bit more, I gather the root cause is the fact that the 
nested fields are represented as rows within rows, is that correct?  If I don't 
know the size of the json array (it varies), using x.getAs[Row](0).getString(0) 
is not really a valid solution.  
Is the solution to apply a lateral view + explode to this? 
I have attempted to change to a lateral view, but looks like my syntax is off:








sqlContext.sql(
"SELECT path,`timestamp`, name, value, pe.value FROM metric 
 lateral view explode(pathElements) a AS pe")
.collect.foreach(println(_))
Which results in:
15/03/31 17:38:34 INFO ContextCleaner: Cleaned broadcast 0
Exception in thread "main" java.lang.RuntimeException: [1.68] failure: 
``UNION'' expected but identifier view found

SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view 
explode(pathElements) a AS pe
   ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79)
at 
org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:174)
at 
org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:173)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:31)
at 
org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83)
at 
org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:83)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303)
at 
com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:97)
at 
com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Is this the 
right approach?  Is this syntax available in 1.2.1:
SELECT
  v1.name, v2.city, v2.state 
FROM people
  LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 
 as name, address
  LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2
 as city, state;
-Todd
On Tue, Mar 31, 2015 at 3:26 PM, Todd

Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
D(*"*device/metric")
> esData.collect.foreach(println(_))
>
> And results in this:
>
> 15/03/31 14:37:48 INFO DAGScheduler: Job 0 finished: collect at 
> ElasticSearchReadWrite.scala:67, took 4.948556 s
> (AUxxDrs4cgadF5SlaMg0,Map(pathElements -> Buffer(Map(node -> State, value -> 
> PA), Map(node -> City, value -> Pittsburgh), Map(node -> Street, value -> 
> 12345 Westbrook Drive), Map(node -> level, value -> main), Map(node -> 
> device, value -> thermostat)), value -> 29.590943279257175, name -> Current 
> Temperature, timestamp -> 2015-03-27T14:53:46+, path -> 
> /PA/Pittsburgh/12345 Westbrook Drive/main/theromostat-1))
>
> Yet this fails:
>
> sqlContext.sql("SELECT path, pathElements, `timestamp`, name, value FROM 
> metric").collect.foreach(println(_))
>
> With this exception:
>
> Create Metric Temporary Table for 
> querying#  Scheam Definition  
>  #
> root
> #  Data from SparkSQL 
>  #15/03/31 14:37:49 INFO 
> BlockManager: Removing broadcast 015/03/31 14:37:49 INFO BlockManager: 
> Removing block broadcast_015/03/31 14:37:49 INFO MemoryStore: Block 
> broadcast_0 of size 1264 dropped from memory (free 278018576)15/03/31 
> 14:37:49 INFO BlockManager: Removing block broadcast_0_piece015/03/31 
> 14:37:49 INFO MemoryStore: Block broadcast_0_piece0 of size 864 dropped from 
> memory (free 278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed 
> broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B, free: 265.1 
> MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of block 
> broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo: Removed 
> broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0 B, free: 530.0 
> MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned broadcast 015/03/31 
> 14:37:49 INFO ScalaEsRowRDD: Reading from [device/metric]15/03/31 14:37:49 
> INFO ScalaEsRowRDD: Discovered mapping 
> {device=[mappings=[metric=[name=STRING, path=STRING, 
> pathElements=[node=STRING, value=STRING], pathId=STRING, timestamp=DATE, 
> value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49 INFO SparkContext: 
> Starting job: collect at SparkPlan.scala:8415/03/31 14:37:49 INFO 
> DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 1 output 
> partitions (allowLocal=false)15/03/31 14:37:49 INFO DAGScheduler: Final 
> stage: Stage 1(collect at SparkPlan.scala:84)15/03/31 14:37:49 INFO 
> DAGScheduler: Parents of final stage: List()15/03/31 14:37:49 INFO 
> DAGScheduler: Missing parents: List()15/03/31 14:37:49 INFO DAGScheduler: 
> Submitting Stage 1 (MappedRDD[6] at map at SparkPlan.scala:84), which has no 
> missing parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120) 
> called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: 
> Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 
> 265.1 MB)15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(2403) called 
> with curMem=4120, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block 
> broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 
> 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added broadcast_1_piece0 in 
> memory on 192.168.1.5:57820 (size: 2.3 KB, free: 265.1 MB)15/03/31 14:37:49 
> INFO BlockManagerMaster: Updated info of block broadcast_1_piece015/03/31 
> 14:37:49 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:83815/03/31 14:37:49 INFO DAGScheduler: Submitting 1 
> missing tasks from Stage 1 (MappedRDD[6] at map at 
> SparkPlan.scala:84)15/03/31 14:37:49 INFO TaskSchedulerImpl: Adding task set 
> 1.0 with 1 tasks15/03/31 14:37:49 INFO TaskSetManager: Starting task 0.0 in 
> stage 1.0 (TID 1, 192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO 
> BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.5:57836 
> (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN TaskSetManager: Lost 
> task 0.0 in stage 1.0 (TID 1, 192.168.1.5): java.util.NoSuchElementException: 
> key not found: node
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.MapLike$class.apply(MapLike.scala:141)
> at scala.collection.AbstractMap.apply(Map.scala:58)
> at 
> org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32)
> at 
> org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9)
> at 
> org.elasticsearch.spark.sql.Sc

SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
 Map(node ->
Street, value -> 12345 Westbrook Drive), Map(node -> level, value ->
main), Map(node -> device, value -> thermostat)), value ->
29.590943279257175, name -> Current Temperature, timestamp ->
2015-03-27T14:53:46+, path -> /PA/Pittsburgh/12345 Westbrook
Drive/main/theromostat-1))

Yet this fails:

sqlContext.sql("SELECT path, pathElements, `timestamp`, name, value
FROM metric").collect.foreach(println(_))

With this exception:

Create Metric Temporary Table for
querying#  Scheam
Definition   #
root
#  Data from SparkSQL
#15/03/31 14:37:49
INFO BlockManager: Removing broadcast 015/03/31 14:37:49 INFO
BlockManager: Removing block broadcast_015/03/31 14:37:49 INFO
MemoryStore: Block broadcast_0 of size 1264 dropped from memory (free
278018576)15/03/31 14:37:49 INFO BlockManager: Removing block
broadcast_0_piece015/03/31 14:37:49 INFO MemoryStore: Block
broadcast_0_piece0 of size 864 dropped from memory (free
278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed
broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B,
free: 265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info
of block broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo:
Removed broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0
B, free: 530.0 MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned
broadcast 015/03/31 14:37:49 INFO ScalaEsRowRDD: Reading from
[device/metric]15/03/31 14:37:49 INFO ScalaEsRowRDD: Discovered
mapping {device=[mappings=[metric=[name=STRING, path=STRING,
pathElements=[node=STRING, value=STRING], pathId=STRING,
timestamp=DATE, value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49
INFO SparkContext: Starting job: collect at SparkPlan.scala:8415/03/31
14:37:49 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:84)
with 1 output partitions (allowLocal=false)15/03/31 14:37:49 INFO
DAGScheduler: Final stage: Stage 1(collect at
SparkPlan.scala:84)15/03/31 14:37:49 INFO DAGScheduler: Parents of
final stage: List()15/03/31 14:37:49 INFO DAGScheduler: Missing
parents: List()15/03/31 14:37:49 INFO DAGScheduler: Submitting Stage 1
(MappedRDD[6] at map at SparkPlan.scala:84), which has no missing
parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120)
called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO
MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 4.0 KB, free 265.1 MB)15/03/31 14:37:49 INFO MemoryStore:
ensureFreeSpace(2403) called with curMem=4120,
maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block
broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB,
free 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added
broadcast_1_piece0 in memory on 192.168.1.5:57820 (size: 2.3 KB, free:
265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of
block broadcast_1_piece015/03/31 14:37:49 INFO SparkContext: Created
broadcast 1 from broadcast at DAGScheduler.scala:83815/03/31 14:37:49
INFO DAGScheduler: Submitting 1 missing tasks from Stage 1
(MappedRDD[6] at map at SparkPlan.scala:84)15/03/31 14:37:49 INFO
TaskSchedulerImpl: Adding task set 1.0 with 1 tasks15/03/31 14:37:49
INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1,
192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO
BlockManagerInfo: Added broadcast_1_piece0 in memory on
192.168.1.5:57836 (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN
TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.5):
java.util.NoSuchElementException: key not found: node
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at 
org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32)
at 
org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9)
at 
org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaRowValueReader.scala:16)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.list(ScrollReader.java:560)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:522)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at 
org.elasticsearch.ha

Re: java.util.NoSuchElementException: key not found:

2015-03-02 Thread Rok Roskar
aha ok, thanks.

If I create different RDDs from a parent RDD and force evaluation
thread-by-thread, then it should presumably be fine, correct? Or do I need
to checkpoint the child RDDs as a precaution in case it needs to be removed
from memory and recomputed?

On Sat, Feb 28, 2015 at 4:28 AM, Shixiong Zhu  wrote:

> RDD is not thread-safe. You should not use it in multiple threads.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-02-27 23:14 GMT+08:00 rok :
>
>> I'm seeing this java.util.NoSuchElementException: key not found: exception
>> pop up sometimes when I run operations on an RDD from multiple threads in
>> a
>> python application. It ends up shutting down the SparkContext so I'm
>> assuming this is a bug -- from what I understand, I should be able to run
>> operations on the same RDD from multiple threads or is this not
>> recommended?
>>
>> I can't reproduce it all the time and I've tried eliminating caching
>> wherever possible to see if that would have an effect, but it doesn't seem
>> to. Each thread first splits the base RDD and then runs the
>> LogisticRegressionWithSGD on the subset.
>>
>> Is there a workaround to this exception?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: java.util.NoSuchElementException: key not found:

2015-02-27 Thread Shixiong Zhu
RDD is not thread-safe. You should not use it in multiple threads.

Best Regards,
Shixiong Zhu

2015-02-27 23:14 GMT+08:00 rok :

> I'm seeing this java.util.NoSuchElementException: key not found: exception
> pop up sometimes when I run operations on an RDD from multiple threads in a
> python application. It ends up shutting down the SparkContext so I'm
> assuming this is a bug -- from what I understand, I should be able to run
> operations on the same RDD from multiple threads or is this not
> recommended?
>
> I can't reproduce it all the time and I've tried eliminating caching
> wherever possible to see if that would have an effect, but it doesn't seem
> to. Each thread first splits the base RDD and then runs the
> LogisticRegressionWithSGD on the subset.
>
> Is there a workaround to this exception?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


java.util.NoSuchElementException: key not found:

2015-02-27 Thread rok
I'm seeing this java.util.NoSuchElementException: key not found: exception
pop up sometimes when I run operations on an RDD from multiple threads in a
python application. It ends up shutting down the SparkContext so I'm
assuming this is a bug -- from what I understand, I should be able to run
operations on the same RDD from multiple threads or is this not recommended? 

I can't reproduce it all the time and I've tried eliminating caching
wherever possible to see if that would have an effect, but it doesn't seem
to. Each thread first splits the base RDD and then runs the
LogisticRegressionWithSGD on the subset.  

Is there a workaround to this exception? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.util.NoSuchElementException: key not found

2014-09-16 Thread Brad Miller
Hi All,

I suspect I am experiencing a bug. I've noticed that while running
larger jobs, they occasionally die with the exception
"java.util.NoSuchElementException: key not found xyz", where "xyz"
denotes the ID of some particular task.  I've excerpted the log from
one job that died in this way below and attached the full log for
reference.

I suspect that my bug is the same as SPARK-2002 (linked below).  Is
there any reason to suspect otherwise?  Is there any known workaround
other than not coalescing?
https://issues.apache.org/jira/browse/SPARK-2002
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCAMwrk0=d1dww5fdbtpkefwokyozltosbbjqamsqqjowlzng...@mail.gmail.com%3E

Note that I have been coalescing SchemaRDDs using "srdd =
SchemaRDD(srdd._jschema_rdd.coalesce(partitions, False, None),
sqlCtx)", the workaround described in this thread.
http://mail-archives.apache.org/mod_mbox/spark-user/201409.mbox/%3ccanr-kkciei17m43-yz5z-pj00zwpw3ka_u7zhve2y7ejw1v...@mail.gmail.com%3E

...
14/09/15 21:43:14 INFO scheduler.TaskSetManager: Starting task 78.0 in
stage 551.0 (TID 78738, bennett.research.intel-research.net,
PROCESS_LOCAL, 1056 bytes)
...
14/09/15 21:43:15 INFO storage.BlockManagerInfo: Added
taskresult_78738 in memory on
bennett.research.intel-research.net:38074 (size: 13.0 MB, free: 1560.8
MB)
...
14/09/15 21:43:15 ERROR scheduler.TaskResultGetter: Exception while
getting task result
java.util.NoSuchElementException: key not found: 78738
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at 
org.apache.spark.scheduler.TaskSetManager.handleTaskGettingResult(TaskSetManager.scala:500)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleTaskGettingResult(TaskSchedulerImpl.scala:348)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:52)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)


I am running the pre-compiled 1.1.0 binaries.

best,
-Brad

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org