(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-28 Thread neha nihal
Thanks. Its working now. My test data had some labels which were not there
in training set.

On Wednesday, June 28, 2017, Pralabh Kumar <pralabhku...@gmail.com
<javascript:_e(%7B%7D,'cvml','pralabhku...@gmail.com');>> wrote:

> Hi Neha
>
> This generally occurred when , you training data set have some value of
> categorical variable ,which in not there in your testing data. For e.g you
> have column DAYS ,with value M,T,W in training data . But when your test
> data contains F ,then it say no key found exception .  Please look into
> this  , and if that's not the case ,then Could you please share your code
> ,and training/testing data for better understanding.
>
> Regards
> Pralabh Kumar
>
> On Wed, Jun 28, 2017 at 11:45 AM, neha nihal <nehaniha...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
>> classification. TF-IDF feature extractor is also used. The training part
>> runs without any issues and returns 100% accuracy. But when I am trying to
>> do prediction using trained model and compute test error, it fails with
>> java.util.NosuchElementException: key not found exception.
>> Any help will be much appreciated.
>>
>> Thanks
>>
>>
>


Re: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-28 Thread Pralabh Kumar
Hi Neha

This generally occurred when , you training data set have some value of
categorical variable ,which in not there in your testing data. For e.g you
have column DAYS ,with value M,T,W in training data . But when your test
data contains F ,then it say no key found exception .  Please look into
this  , and if that's not the case ,then Could you please share your code
,and training/testing data for better understanding.

Regards
Pralabh Kumar

On Wed, Jun 28, 2017 at 11:45 AM, neha nihal <nehaniha...@gmail.com> wrote:

>
> Hi,
>
> I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
> classification. TF-IDF feature extractor is also used. The training part
> runs without any issues and returns 100% accuracy. But when I am trying to
> do prediction using trained model and compute test error, it fails with
> java.util.NosuchElementException: key not found exception.
> Any help will be much appreciated.
>
> Thanks
>
>


Fwd: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-28 Thread neha nihal
Hi,

I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
classification. TF-IDF feature extractor is also used. The training part
runs without any issues and returns 100% accuracy. But when I am trying to
do prediction using trained model and compute test error, it fails with
java.util.NosuchElementException: key not found exception.
Any help will be much appreciated.

Thanks


(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.

2017-06-27 Thread neha nihal
Hi,

I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text
classification. TF-IDF feature extractor is also used. The training part
runs without any issues and returns 100% accuracy. But when I am trying to
do prediction using trained model and compute test error, it fails with
java.util.NosuchElementException: key not found exception.
Any help will be much appreciated.

Thanks & Regards


java.util.NoSuchElementException: key not found error

2015-10-21 Thread Sourav Mazumder
In 1.5.0 if I use randomSplit on a data frame I get this error.

Here is teh code snippet -

val splitData = merged.randomSplit(Array(70,30))
val trainData = splitData(0).persist()
val testData = splitData(1)

trainData.registerTempTable("trn")

%sql select * from trn

The exception goes like this -

java.util.NoSuchElementException: key not found: 1910 at
scala.collection.MapLike$class.default(MapLike.scala:228) at
scala.collection.AbstractMap.default(Map.scala:58) at
scala.collection.mutable.HashMap.apply(HashMap.scala:64) at
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
at
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:88) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Any idea ?

regards,
Sourav


Re: java.util.NoSuchElementException: key not found error

2015-10-21 Thread Josh Rosen
This is https://issues.apache.org/jira/browse/SPARK-10422, which has been
fixed in Spark 1.5.1.

On Wed, Oct 21, 2015 at 4:40 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> In 1.5.0 if I use randomSplit on a data frame I get this error.
>
> Here is teh code snippet -
>
> val splitData = merged.randomSplit(Array(70,30))
> val trainData = splitData(0).persist()
> val testData = splitData(1)
>
> trainData.registerTempTable("trn")
>
> %sql select * from trn
>
> The exception goes like this -
>
> java.util.NoSuchElementException: key not found: 1910 at
> scala.collection.MapLike$class.default(MapLike.scala:228) at
> scala.collection.AbstractMap.default(Map.scala:58) at
> scala.collection.mutable.HashMap.apply(HashMap.scala:64) at
> org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
> at
> org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
> at
> org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
> at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
> at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
> org.apache.spark.scheduler.Task.run(Task.scala:88) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Any idea ?
>
> regards,
> Sourav
>


Re: calling persist would cause java.util.NoSuchElementException: key not found:

2015-10-02 Thread Shixiong Zhu
Do you have the full stack trace? Could you check if it's same as
https://issues.apache.org/jira/browse/SPARK-10422

Best Regards,
Shixiong Zhu

2015-10-01 17:05 GMT+08:00 Eyad Sibai <eyad.alsi...@gmail.com>:

> Hi
>
> I am trying to call .persist() on a dataframe but once I execute the next
> line I am getting
> java.util.NoSuchElementException: key not found: ….
>
> I tried to do persist on disk also the same thing.
>
> I am using:
> pyspark with python3
> spark 1.5
>
>
> Thanks!
>
>
> EYAD SIBAI
> Risk Engineer
>
> *iZettle ®*
> ––
>
> Mobile: +46 72 911 60 54 <+46%2072%20911%2060%2054>
> Web: www.izettle.com <http://izettle.com/>
>


calling persist would cause java.util.NoSuchElementException: key not found:

2015-10-01 Thread Eyad Sibai
Hi

I am trying to call .persist() on a dataframe but once I execute the next line 
I am getting
java.util.NoSuchElementException: key not found: ….


I tried to do persist on disk also the same thing.


I am using:
pyspark with python3
spark 1.5




Thanks!



EYAD SIBAI
Risk Engineer

iZettle ®
––


Mobile: +46 72 911 60 54
Web: www.izettle.com

Re: java.util.NoSuchElementException: key not found

2015-09-11 Thread Yin Huai
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it
has been fixed in branch 1.5. 1.5.1 release will have it.

On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk <
guoqing0...@yahoo.com.hk> wrote:

> Hi all ,
> After upgrade spark to 1.5 ,  Streaming throw
> java.util.NoSuchElementException: key not found occasionally , is the
> problem of data cause this error ?  please help me if anyone got similar
> problem before , Thanks very much.
>
> the exception accur when write into database.
>
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 
> (TID 76, slave2): java.util.NoSuchElementException: key not found: 
> ruixue.sys.session.request
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>
> at 
> org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
>
> at 
> org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
>
> at 
> org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
>
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
>
> at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
>
> at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
>
> at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
>
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
>
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
> --
> guoqing0...@yahoo.com.hk
>


java.util.NoSuchElementException: key not found

2015-09-11 Thread guoqing0...@yahoo.com.hk
Hi all , 
After upgrade spark to 1.5 ,  Streaming throw java.util.NoSuchElementException: 
key not found occasionally , is the problem of data cause this error ?  please 
help me if anyone got similar problem before , Thanks very much.

the exception accur when write into database.


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 
76, slave2): java.util.NoSuchElementException: key not found: 
ruixue.sys.session.request
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at 
org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
at 
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
at 
org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
at 
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
at 
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)



guoqing0...@yahoo.com.hk


RE: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread java8964
You can use the HiveContext instead of SQLContext, which should support all the 
HiveQL, including lateral view explode.
SQLContext is not supporting that yet.
BTW, nice coding format in the email.
Yong

Date: Tue, 31 Mar 2015 18:18:19 -0400
Subject: Re: SparkSql - java.util.NoSuchElementException: key not found: node 
when access JSON Array
From: tsind...@gmail.com
To: user@spark.apache.org

So in looking at this a bit more, I gather the root cause is the fact that the 
nested fields are represented as rows within rows, is that correct?  If I don't 
know the size of the json array (it varies), using x.getAs[Row](0).getString(0) 
is not really a valid solution.  
Is the solution to apply a lateral view + explode to this? 
I have attempted to change to a lateral view, but looks like my syntax is off:








sqlContext.sql(
SELECT path,`timestamp`, name, value, pe.value FROM metric 
 lateral view explode(pathElements) a AS pe)
.collect.foreach(println(_))
Which results in:
15/03/31 17:38:34 INFO ContextCleaner: Cleaned broadcast 0
Exception in thread main java.lang.RuntimeException: [1.68] failure: 
``UNION'' expected but identifier view found

SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view 
explode(pathElements) a AS pe
   ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79)
at 
org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:174)
at 
org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:173)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:31)
at 
org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83)
at 
org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:83)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303)
at 
com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:97)
at 
com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Is this the 
right approach?  Is this syntax available in 1.2.1:
SELECT
  v1.name, v2.city, v2.state 
FROM people
  LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 
 as name, address
  LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2
 as city, state;
-Todd
On Tue, Mar 31, 2015 at 3:26 PM, Todd Nist tsind...@gmail.com wrote:
I am accessing ElasticSearch via

SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
: Removing block broadcast_015/03/31 14:37:49 INFO
MemoryStore: Block broadcast_0 of size 1264 dropped from memory (free
278018576)15/03/31 14:37:49 INFO BlockManager: Removing block
broadcast_0_piece015/03/31 14:37:49 INFO MemoryStore: Block
broadcast_0_piece0 of size 864 dropped from memory (free
278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed
broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B,
free: 265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info
of block broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo:
Removed broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0
B, free: 530.0 MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned
broadcast 015/03/31 14:37:49 INFO ScalaEsRowRDD: Reading from
[device/metric]15/03/31 14:37:49 INFO ScalaEsRowRDD: Discovered
mapping {device=[mappings=[metric=[name=STRING, path=STRING,
pathElements=[node=STRING, value=STRING], pathId=STRING,
timestamp=DATE, value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49
INFO SparkContext: Starting job: collect at SparkPlan.scala:8415/03/31
14:37:49 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:84)
with 1 output partitions (allowLocal=false)15/03/31 14:37:49 INFO
DAGScheduler: Final stage: Stage 1(collect at
SparkPlan.scala:84)15/03/31 14:37:49 INFO DAGScheduler: Parents of
final stage: List()15/03/31 14:37:49 INFO DAGScheduler: Missing
parents: List()15/03/31 14:37:49 INFO DAGScheduler: Submitting Stage 1
(MappedRDD[6] at map at SparkPlan.scala:84), which has no missing
parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120)
called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO
MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 4.0 KB, free 265.1 MB)15/03/31 14:37:49 INFO MemoryStore:
ensureFreeSpace(2403) called with curMem=4120,
maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block
broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB,
free 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added
broadcast_1_piece0 in memory on 192.168.1.5:57820 (size: 2.3 KB, free:
265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of
block broadcast_1_piece015/03/31 14:37:49 INFO SparkContext: Created
broadcast 1 from broadcast at DAGScheduler.scala:83815/03/31 14:37:49
INFO DAGScheduler: Submitting 1 missing tasks from Stage 1
(MappedRDD[6] at map at SparkPlan.scala:84)15/03/31 14:37:49 INFO
TaskSchedulerImpl: Adding task set 1.0 with 1 tasks15/03/31 14:37:49
INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1,
192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO
BlockManagerInfo: Added broadcast_1_piece0 in memory on
192.168.1.5:57836 (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN
TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.5):
java.util.NoSuchElementException: key not found: node
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at 
org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32)
at 
org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9)
at 
org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaRowValueReader.scala:16)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.list(ScrollReader.java:560)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:522)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:186)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:165)
at 
org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at 
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq

Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
 
 memory (free 278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed 
 broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B, free: 265.1 
 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of block 
 broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo: Removed 
 broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0 B, free: 530.0 
 MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned broadcast 015/03/31 
 14:37:49 INFO ScalaEsRowRDD: Reading from [device/metric]15/03/31 14:37:49 
 INFO ScalaEsRowRDD: Discovered mapping 
 {device=[mappings=[metric=[name=STRING, path=STRING, 
 pathElements=[node=STRING, value=STRING], pathId=STRING, timestamp=DATE, 
 value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49 INFO SparkContext: 
 Starting job: collect at SparkPlan.scala:8415/03/31 14:37:49 INFO 
 DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 1 output 
 partitions (allowLocal=false)15/03/31 14:37:49 INFO DAGScheduler: Final 
 stage: Stage 1(collect at SparkPlan.scala:84)15/03/31 14:37:49 INFO 
 DAGScheduler: Parents of final stage: List()15/03/31 14:37:49 INFO 
 DAGScheduler: Missing parents: List()15/03/31 14:37:49 INFO DAGScheduler: 
 Submitting Stage 1 (MappedRDD[6] at map at SparkPlan.scala:84), which has no 
 missing parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120) 
 called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: 
 Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 
 265.1 MB)15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(2403) called 
 with curMem=4120, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block 
 broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 
 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added broadcast_1_piece0 in 
 memory on 192.168.1.5:57820 (size: 2.3 KB, free: 265.1 MB)15/03/31 14:37:49 
 INFO BlockManagerMaster: Updated info of block broadcast_1_piece015/03/31 
 14:37:49 INFO SparkContext: Created broadcast 1 from broadcast at 
 DAGScheduler.scala:83815/03/31 14:37:49 INFO DAGScheduler: Submitting 1 
 missing tasks from Stage 1 (MappedRDD[6] at map at 
 SparkPlan.scala:84)15/03/31 14:37:49 INFO TaskSchedulerImpl: Adding task set 
 1.0 with 1 tasks15/03/31 14:37:49 INFO TaskSetManager: Starting task 0.0 in 
 stage 1.0 (TID 1, 192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO 
 BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.5:57836 
 (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN TaskSetManager: Lost 
 task 0.0 in stage 1.0 (TID 1, 192.168.1.5): java.util.NoSuchElementException: 
 key not found: node
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.MapLike$class.apply(MapLike.scala:141)
 at scala.collection.AbstractMap.apply(Map.scala:58)
 at 
 org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32)
 at 
 org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9)
 at 
 org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaRowValueReader.scala:16)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.list(ScrollReader.java:560)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:522)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:186)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:165)
 at 
 org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
 at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
 at 
 org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157

Re: java.util.NoSuchElementException: key not found:

2015-03-02 Thread Rok Roskar
aha ok, thanks.

If I create different RDDs from a parent RDD and force evaluation
thread-by-thread, then it should presumably be fine, correct? Or do I need
to checkpoint the child RDDs as a precaution in case it needs to be removed
from memory and recomputed?

On Sat, Feb 28, 2015 at 4:28 AM, Shixiong Zhu zsxw...@gmail.com wrote:

 RDD is not thread-safe. You should not use it in multiple threads.

 Best Regards,
 Shixiong Zhu

 2015-02-27 23:14 GMT+08:00 rok rokros...@gmail.com:

 I'm seeing this java.util.NoSuchElementException: key not found: exception
 pop up sometimes when I run operations on an RDD from multiple threads in
 a
 python application. It ends up shutting down the SparkContext so I'm
 assuming this is a bug -- from what I understand, I should be able to run
 operations on the same RDD from multiple threads or is this not
 recommended?

 I can't reproduce it all the time and I've tried eliminating caching
 wherever possible to see if that would have an effect, but it doesn't seem
 to. Each thread first splits the base RDD and then runs the
 LogisticRegressionWithSGD on the subset.

 Is there a workaround to this exception?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: java.util.NoSuchElementException: key not found:

2015-02-27 Thread Shixiong Zhu
RDD is not thread-safe. You should not use it in multiple threads.

Best Regards,
Shixiong Zhu

2015-02-27 23:14 GMT+08:00 rok rokros...@gmail.com:

 I'm seeing this java.util.NoSuchElementException: key not found: exception
 pop up sometimes when I run operations on an RDD from multiple threads in a
 python application. It ends up shutting down the SparkContext so I'm
 assuming this is a bug -- from what I understand, I should be able to run
 operations on the same RDD from multiple threads or is this not
 recommended?

 I can't reproduce it all the time and I've tried eliminating caching
 wherever possible to see if that would have an effect, but it doesn't seem
 to. Each thread first splits the base RDD and then runs the
 LogisticRegressionWithSGD on the subset.

 Is there a workaround to this exception?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




java.util.NoSuchElementException: key not found:

2015-02-27 Thread rok
I'm seeing this java.util.NoSuchElementException: key not found: exception
pop up sometimes when I run operations on an RDD from multiple threads in a
python application. It ends up shutting down the SparkContext so I'm
assuming this is a bug -- from what I understand, I should be able to run
operations on the same RDD from multiple threads or is this not recommended? 

I can't reproduce it all the time and I've tried eliminating caching
wherever possible to see if that would have an effect, but it doesn't seem
to. Each thread first splits the base RDD and then runs the
LogisticRegressionWithSGD on the subset.  

Is there a workaround to this exception? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.util.NoSuchElementException: key not found

2014-09-16 Thread Brad Miller
Hi All,

I suspect I am experiencing a bug. I've noticed that while running
larger jobs, they occasionally die with the exception
java.util.NoSuchElementException: key not found xyz, where xyz
denotes the ID of some particular task.  I've excerpted the log from
one job that died in this way below and attached the full log for
reference.

I suspect that my bug is the same as SPARK-2002 (linked below).  Is
there any reason to suspect otherwise?  Is there any known workaround
other than not coalescing?
https://issues.apache.org/jira/browse/SPARK-2002
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCAMwrk0=d1dww5fdbtpkefwokyozltosbbjqamsqqjowlzng...@mail.gmail.com%3E

Note that I have been coalescing SchemaRDDs using srdd =
SchemaRDD(srdd._jschema_rdd.coalesce(partitions, False, None),
sqlCtx), the workaround described in this thread.
http://mail-archives.apache.org/mod_mbox/spark-user/201409.mbox/%3ccanr-kkciei17m43-yz5z-pj00zwpw3ka_u7zhve2y7ejw1v...@mail.gmail.com%3E

...
14/09/15 21:43:14 INFO scheduler.TaskSetManager: Starting task 78.0 in
stage 551.0 (TID 78738, bennett.research.intel-research.net,
PROCESS_LOCAL, 1056 bytes)
...
14/09/15 21:43:15 INFO storage.BlockManagerInfo: Added
taskresult_78738 in memory on
bennett.research.intel-research.net:38074 (size: 13.0 MB, free: 1560.8
MB)
...
14/09/15 21:43:15 ERROR scheduler.TaskResultGetter: Exception while
getting task result
java.util.NoSuchElementException: key not found: 78738
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at 
org.apache.spark.scheduler.TaskSetManager.handleTaskGettingResult(TaskSetManager.scala:500)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleTaskGettingResult(TaskSchedulerImpl.scala:348)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:52)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)


I am running the pre-compiled 1.1.0 binaries.

best,
-Brad

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org