[jira] [Updated] (SPARK-9625) SparkILoop creates sql context continuously, thousands of times

Sean Owen (JIRA) Wed, 05 Aug 2015 00:46:32 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated SPARK-9625:
-----------------------------
    Component/s: SQL

Did you say this is reproducible -- how do you do it?

> SparkILoop creates sql context continuously, thousands of times
> ---------------------------------------------------------------
>
>                 Key: SPARK-9625
>                 URL: https://issues.apache.org/jira/browse/SPARK-9625
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 1.4.1
>         Environment: Ubuntu on AWS
>            Reporter: Simeon Simeonov
>              Labels: sql
>
> Occasionally but repeatably, based on the Spark SQL operations being run, 
> {{spark-shell}} gets into a funk where it attempts to create a sql context 
> over and over again as it is doing its work. Example output below:
> {code}
> 15/08/05 03:04:12 INFO DAGScheduler: looking for newly runnable stages
> 15/08/05 03:04:12 INFO DAGScheduler: running: Set()
> 15/08/05 03:04:12 INFO DAGScheduler: waiting: Set(ShuffleMapStage 7, 
> ResultStage 8)
> 15/08/05 03:04:12 INFO DAGScheduler: failed: Set()
> 15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ShuffleMapStage 7: 
> List()
> 15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ResultStage 8: 
> List(ShuffleMapStage 7)
> 15/08/05 03:04:12 INFO DAGScheduler: Submitting ShuffleMapStage 7 
> (MapPartitionsRDD[49] at map at <console>:474), which is now runnable
> 15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(47840) called with 
> curMem=685306, maxMem=26671746908
> 15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12 stored as values in 
> memory (estimated size 46.7 KB, free 24.8 GB)
> 15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(15053) called with 
> curMem=733146, maxMem=26671746908
> 15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes 
> in memory (estimated size 14.7 KB, free 24.8 GB)
> 15/08/05 03:04:12 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory 
> on localhost:39451 (size: 14.7 KB, free: 24.8 GB)
> 15/08/05 03:04:12 INFO SparkContext: Created broadcast 12 from broadcast at 
> DAGScheduler.scala:874
> 15/08/05 03:04:12 INFO DAGScheduler: Submitting 1 missing tasks from 
> ShuffleMapStage 7 (MapPartitionsRDD[49] at map at <console>:474)
> 15/08/05 03:04:12 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
> 15/08/05 03:04:12 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 
> 684, localhost, PROCESS_LOCAL, 1461 bytes)
> 15/08/05 03:04:12 INFO Executor: Running task 0.0 in stage 7.0 (TID 684)
> 15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Getting 214 non-empty 
> blocks out of 214 blocks
> 15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches 
> in 1 ms
> 15/08/05 03:04:12 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO HiveMetaStore: No user is added in admin role, since 
> config is empty
> 15/08/05 03:04:13 INFO SessionState: No Tez session required at this point. 
> hive.execution.engine=mr.
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> 15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
> SQL context available as sqlContext.
> {code}
> In the end, the job succeeded and produced correct output but it ran many 
> times slower that it should have.
> Longer output, showing job completion, can be found 
> [here|https://gist.github.com/ssimeonov/8064f5f77897bb337157]. In the 
> example, sql context was created about 2,500 times. I've had runs where it's 
> been over 100,000 times.
> Could it be that an exception silently generated by the job is causing this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9625) SparkILoop creates sql context continuously, thousands of times

Reply via email to