Spark Streaming Standalone 1.5 - Stage cancelled because SparkContext was shut down

2015-09-30 Thread tranan
Hello All,

I have several Spark Streaming applications running on Standalone mode in
Spark 1.5.  Spark is currently set up for dynamic resource allocation.  The
issue I am seeing is that I can have about 12 Spark Streaming Jobs running
concurrently.  Occasionally I would see more than half where to fail due to
Stage cancelled because SparkContext was shut down.  It would automatically
restart as it runs on supervised mode.  Attached is the screenshot of one of
the jobs that failed.  Anyone have any insight as to what is going on?

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Standalone-1-5-Stage-cancelled-because-SparkContext-was-shut-down-tp24885.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark SQL ArrayOutofBoundsException Question

2015-07-28 Thread tranan
Hello all,

I am currently having an error with Spark SQL access Elasticsearch using
Elasticsearch Spark integration.  Below is the series of command I issued
along with the stacktrace.  I am unclear what the error could mean.  I can
print the schema correctly but error out if i try and display a few results. 
Can you guys point me in the right direction?  

scala
sqlContext.read.format(org.elasticsearch.spark.sql).options(esOptions).load(reddit_comment_public-201507-v3/default).registerTempTable(reddit_comment)

scala reddit_comment_df.printSchema

root
 |-- data: struct (nullable = true)
 ||-- archived: boolean (nullable = true)
 ||-- author: string (nullable = true)
 ||-- author_flair_css_class: string (nullable = true)
 ||-- author_flair_text: string (nullable = true)
 ||-- body: string (nullable = true)
 ||-- body_html: string (nullable = true)
 ||-- controversiality: long (nullable = true)
 ||-- created: long (nullable = true)
 ||-- created_utc: long (nullable = true)
 ||-- distinguished: string (nullable = true)
 ||-- downs: long (nullable = true)
 ||-- edited: long (nullable = true)
 ||-- gilded: long (nullable = true)
 ||-- id: string (nullable = true)
 ||-- link_author: string (nullable = true)
 ||-- link_id: string (nullable = true)
 ||-- link_title: string (nullable = true)
 ||-- link_url: string (nullable = true)
 ||-- name: string (nullable = true)
 ||-- parent_id: string (nullable = true)
 ||-- replies: string (nullable = true)
 ||-- saved: boolean (nullable = true)
 ||-- score: long (nullable = true)
 ||-- score_hidden: boolean (nullable = true)
 ||-- subreddit: string (nullable = true)
 ||-- subreddit_id: string (nullable = true)
 ||-- ups: long (nullable = true)

scala reddit_comment_df.show

15/07/27 20:38:31 INFO ScalaEsRowRDD: Reading from
[reddit_comment_public-201507-v3/default]
15/07/27 20:38:31 INFO ScalaEsRowRDD: Discovered mapping
{reddit_comment_public-201507-v3=[mappings=[default=[acquire_date=DATE,
elasticsearch_date_partition_index=STRING,
elasticsearch_language_partition_index=STRING, elasticsearch_type=STRING,
source=[data=[archived=BOOLEAN, author=STRING,
author_flair_css_class=STRING, author_flair_text=STRING, body=STRING,
body_html=STRING, controversiality=LONG, created=LONG, created_utc=LONG,
distinguished=STRING, downs=LONG, edited=LONG, gilded=LONG, id=STRING,
link_author=STRING, link_id=STRING, link_title=STRING, link_url=STRING,
name=STRING, parent_id=STRING, replies=STRING, saved=BOOLEAN, score=LONG,
score_hidden=BOOLEAN, subreddit=STRING, subreddit_id=STRING, ups=LONG],
kind=STRING], source_geo_location=GEO_POINT, source_id=STRING,
source_language=STRING, source_time=DATE]]]} for
[reddit_comment_public-201507-v3/default]

15/07/27 20:38:31 INFO SparkContext: Starting job: show at console:26
15/07/27 20:38:31 INFO DAGScheduler: Got job 13 (show at console:26) with
1 output partitions (allowLocal=false)
15/07/27 20:38:31 INFO DAGScheduler: Final stage: ResultStage 16(show at
console:26)
15/07/27 20:38:31 INFO DAGScheduler: Parents of final stage: List()
15/07/27 20:38:31 INFO DAGScheduler: Missing parents: List()
15/07/27 20:38:31 INFO DAGScheduler: Submitting ResultStage 16
(MapPartitionsRDD[65] at show at console:26), which has no missing parents
15/07/27 20:38:31 INFO MemoryStore: ensureFreeSpace(7520) called with
curMem=71364, maxMem=2778778828
15/07/27 20:38:31 INFO MemoryStore: Block broadcast_13 stored as values in
memory (estimated size 7.3 KB, free 2.6 GB)
15/07/27 20:38:31 INFO MemoryStore: ensureFreeSpace(3804) called with
curMem=78884, maxMem=2778778828
15/07/27 20:38:31 INFO MemoryStore: Block broadcast_13_piece0 stored as
bytes in memory (estimated size 3.7 KB, free 2.6 GB)
15/07/27 20:38:31 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory
on 172.25.185.239:58296 (size: 3.7 KB, free: 2.6 GB)
15/07/27 20:38:31 INFO SparkContext: Created broadcast 13 from broadcast at
DAGScheduler.scala:874
15/07/27 20:38:31 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 16 (MapPartitionsRDD[65] at show at console:26)
15/07/27 20:38:31 INFO TaskSchedulerImpl: Adding task set 16.0 with 1 tasks
15/07/27 20:38:31 INFO FairSchedulableBuilder: Added task set TaskSet_16
tasks to pool default
15/07/27 20:38:31 INFO TaskSetManager: Starting task 0.0 in stage 16.0 (TID
172, 172.25.185.164, ANY, 5085 bytes)
15/07/27 20:38:31 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory
on 172.25.185.164:50275 (size: 3.7 KB, free: 3.6 GB)
15/07/27 20:38:31 WARN TaskSetManager: Lost task 0.0 in stage 16.0 (TID 172,
172.25.185.164): java.lang.ArrayIndexOutOfBoundsException: -1

at
scala.collection.mutable.ResizableArray$class.update(ResizableArray.scala:49)

at scala.collection.mutable.ArrayBuffer.update(ArrayBuffer.scala:47)

at
org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:29)

at