HAO REN created SPARK-6675:
------------------------------

             Summary: HiveContext setConf seems not stable
                 Key: SPARK-6675
                 URL: https://issues.apache.org/jira/browse/SPARK-6675
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0
         Environment: AWS ec2 xlarge2 cluster launched by spark's script
            Reporter: HAO REN


I find HiveContext.setConf does not work correctly. Here are some code snippets 
showing the problem:

snippet 1:
----------------------------------------------------------------------------------------------------------------
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Main extends App {

  val conf = new SparkConf()
    .setAppName("context-test")
    .setMaster("local[8]")
  val sc = new SparkContext(conf)
  val hc = new HiveContext(sc)

  hc.setConf("spark.sql.shuffle.partitions", "10")
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
  hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println
}
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
(spark.sql.shuffle.partitions,10)

snippet 2:
----------------------------------------------------------------------------------------------------------------
...
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.setConf("spark.sql.shuffle.partitions", "10")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
  hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println
...
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/user/hive/warehouse)
(spark.sql.shuffle.partitions,10)

You can see that I just permuted the two setConf call, then that leads to two 
different Hive configuration.
It seems that HiveContext can not set a new value on 
"hive.metastore.warehouse.dir" key in one or the first "setConf" call.
You need another "setConf" call before changing "hive.metastore.warehouse.dir". 
For example, set "hive.metastore.warehouse.dir" twice and the snippet 1

snippet 3:
----------------------------------------------------------------------------------------------------------------
...
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
...
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)


You can reproduce this if you move to the latest branch-1.3 (1.3.1-snapshot, 
htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)

I have also tested the released 1.3.0 (htag = 
4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to