Re: Re: HiveContext setConf seems not stable

2015-04-23 Thread guoqing0...@yahoo.com.hk
Hi all , 
My understanding for this problem is SQLConf will be overwrite by the 
hiveconfig in initialization phase when setConf(key: String, value: String)  
being called in the first time as below code snippets , so it is correctly in 
later. I`m not sure whether it is right , any point are welcome. Thanks.
@transient protected[hive] lazy val hiveconf: HiveConf = {
  setConf(sessionState.getConf.getAllProperties)
  sessionState.getConf
}
protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String] = 
synchronized {
  try {
val cmd_trimmed: String = cmd.trim()
val tokens: Array[String] = cmd_trimmed.split(\\s+)
val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), 
hiveconf)...}protected[sql] def runSqlHive(sql: String): 
Seq[String] = {
  val maxResults = 10
  val results = runHive(sql, maxResults)
  // It is very confusing when you only get back some of the results...
  if (results.size == maxResults) sys.error(RESULTS POSSIBLY TRUNCATED)
  results
}override def setConf(key: String, value: String): Unit = {
  super.setConf(key, value)
  runSqlHive(sSET $key=$value)

}
 
From: madhu phatak
Date: 2015-04-23 02:17
To: Michael Armbrust
CC: Ophir Cohen; Hao Ren; user
Subject: Re: HiveContext setConf seems not stable
Hi,
calling getConf don't solve the issue. Even many hive specific queries are 
broken. Seems like no hive configurations are getting passed properly. 




Regards,
Madhukara Phatak
http://datamantra.io/

On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust mich...@databricks.com 
wrote:
As a workaround, can you call getConf first before any setConf?

On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen oph...@gmail.com wrote:
I think I encounter the same problem, I'm trying to turn on the compression of 
Hive.
I have the following lines:
def initHiveContext(sc: SparkContext): HiveContext = {
val hc: HiveContext = new HiveContext(sc)
hc.setConf(hive.exec.compress.output, true)
hc.setConf(mapreduce.output.fileoutputformat.compress.codec, 
org.apache.hadoop.io.compress.SnappyCodec)
hc.setConf(mapreduce.output.fileoutputformat.compress.type, BLOCK)


logger.info(hc.getConf(hive.exec.compress.output))
logger.info(hc.getConf(mapreduce.output.fileoutputformat.compress.codec))
logger.info(hc.getConf(mapreduce.output.fileoutputformat.compress.type))

hc
  }
And the log for calling it twice:
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: 
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: 
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK

BTW
It worked on 1.2.1...


On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren inv...@gmail.com wrote:
Hi,

Jira created: https://issues.apache.org/jira/browse/SPARK-6675

Thank you.


On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust mich...@databricks.com wrote:
Can you open a JIRA please?

On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren inv...@gmail.com wrote:
Hi,

I find HiveContext.setConf does not work correctly. Here are some code snippets 
showing the problem:

snippet 1:

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Main extends App {

  val conf = new SparkConf()
.setAppName(context-test)
.setMaster(local[8])
  val sc = new SparkContext(conf)
  val hc = new HiveContext(sc)

  hc.setConf(spark.sql.shuffle.partitions, 10)
  hc.setConf(hive.metastore.warehouse.dir, /home/spark/hive/warehouse_test)
  hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
  hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach println
}


Results:
(hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
(spark.sql.shuffle.partitions,10)

snippet 2:

...
  hc.setConf(hive.metastore.warehouse.dir, /home/spark/hive/warehouse_test)
  hc.setConf(spark.sql.shuffle.partitions, 10)
  hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
  hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach println
...


Results:
(hive.metastore.warehouse.dir,/user/hive/warehouse)
(spark.sql.shuffle.partitions,10)

You can see that I just permuted the two setConf call, then that leads to two 
different Hive configuration.
It seems that HiveContext can not set a new value

Re: HiveContext setConf seems not stable

2015-04-21 Thread Michael Armbrust
As a workaround, can you call getConf first before any setConf?

On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen oph...@gmail.com wrote:

 I think I encounter the same problem, I'm trying to turn on the
 compression of Hive.
 I have the following lines:
 def initHiveContext(sc: SparkContext): HiveContext = {
 val hc: HiveContext = new HiveContext(sc)
 hc.setConf(hive.exec.compress.output, true)
 hc.setConf(mapreduce.output.fileoutputformat.compress.codec,
 org.apache.hadoop.io.compress.SnappyCodec)
 hc.setConf(mapreduce.output.fileoutputformat.compress.type, BLOCK)


 logger.info(hc.getConf(hive.exec.compress.output))
 logger.info
 (hc.getConf(mapreduce.output.fileoutputformat.compress.codec))
 logger.info
 (hc.getConf(mapreduce.output.fileoutputformat.compress.type))

 hc
   }
 And the log for calling it twice:
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
 org.apache.hadoop.io.compress.SnappyCodec
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
 org.apache.hadoop.io.compress.SnappyCodec
 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK

 BTW
 It worked on 1.2.1...


 On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren inv...@gmail.com wrote:

 Hi,

 Jira created: https://issues.apache.org/jira/browse/SPARK-6675

 Thank you.


 On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust mich...@databricks.com
 wrote:

 Can you open a JIRA please?

 On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren inv...@gmail.com wrote:

 Hi,

 I find HiveContext.setConf does not work correctly. Here are some code
 snippets showing the problem:

 snippet 1:

 
 import org.apache.spark.sql.hive.HiveContext
 import org.apache.spark.{SparkConf, SparkContext}

 object Main extends App {

   val conf = new SparkConf()
 .setAppName(context-test)
 .setMaster(local[8])
   val sc = new SparkContext(conf)
   val hc = new HiveContext(sc)

   *hc.setConf(spark.sql.shuffle.partitions, 10)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 }

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
 (spark.sql.shuffle.partitions,10)

 snippet 2:

 
 ...
   *hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(spark.sql.shuffle.partitions, 10)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/user/hive/warehouse)
 (spark.sql.shuffle.partitions,10)

 *You can see that I just permuted the two setConf call, then that leads
 to two different Hive configuration.*
 *It seems that HiveContext can not set a new value on
 hive.metastore.warehouse.dir key in one or the first setConf call.*
 *You need another setConf call before changing
 hive.metastore.warehouse.dir. For example, set
 hive.metastore.warehouse.dir twice and the snippet 1*

 snippet 3:

 
 ...
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)


 *You can reproduce this if you move to the latest branch-1.3
 (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*

 *I have also tested the released 1.3.0 (htag =
 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*

 *Please tell me if I am missing something. Any help is highly
 appreciated.*

 Hao

 --
 Hao Ren

 {Data, Software} Engineer @ ClaraVista

 Paris, France





 --
 Hao Ren

 {Data, Software} Engineer @ ClaraVista

 Paris, France





Re: HiveContext setConf seems not stable

2015-04-02 Thread Hao Ren
Hi,

Jira created: https://issues.apache.org/jira/browse/SPARK-6675

Thank you.


On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust mich...@databricks.com
wrote:

 Can you open a JIRA please?

 On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren inv...@gmail.com wrote:

 Hi,

 I find HiveContext.setConf does not work correctly. Here are some code
 snippets showing the problem:

 snippet 1:

 
 import org.apache.spark.sql.hive.HiveContext
 import org.apache.spark.{SparkConf, SparkContext}

 object Main extends App {

   val conf = new SparkConf()
 .setAppName(context-test)
 .setMaster(local[8])
   val sc = new SparkContext(conf)
   val hc = new HiveContext(sc)

   *hc.setConf(spark.sql.shuffle.partitions, 10)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 }

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
 (spark.sql.shuffle.partitions,10)

 snippet 2:

 
 ...
   *hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(spark.sql.shuffle.partitions, 10)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/user/hive/warehouse)
 (spark.sql.shuffle.partitions,10)

 *You can see that I just permuted the two setConf call, then that leads
 to two different Hive configuration.*
 *It seems that HiveContext can not set a new value on
 hive.metastore.warehouse.dir key in one or the first setConf call.*
 *You need another setConf call before changing
 hive.metastore.warehouse.dir. For example, set
 hive.metastore.warehouse.dir twice and the snippet 1*

 snippet 3:

 
 ...
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)


 *You can reproduce this if you move to the latest branch-1.3
 (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*

 *I have also tested the released 1.3.0 (htag =
 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*

 *Please tell me if I am missing something. Any help is highly
 appreciated.*

 Hao

 --
 Hao Ren

 {Data, Software} Engineer @ ClaraVista

 Paris, France





-- 
Hao Ren

{Data, Software} Engineer @ ClaraVista

Paris, France


Re: HiveContext setConf seems not stable

2015-04-01 Thread Michael Armbrust
Can you open a JIRA please?

On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren inv...@gmail.com wrote:

 Hi,

 I find HiveContext.setConf does not work correctly. Here are some code
 snippets showing the problem:

 snippet 1:

 
 import org.apache.spark.sql.hive.HiveContext
 import org.apache.spark.{SparkConf, SparkContext}

 object Main extends App {

   val conf = new SparkConf()
 .setAppName(context-test)
 .setMaster(local[8])
   val sc = new SparkContext(conf)
   val hc = new HiveContext(sc)

   *hc.setConf(spark.sql.shuffle.partitions, 10)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 }

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
 (spark.sql.shuffle.partitions,10)

 snippet 2:

 
 ...
   *hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(spark.sql.shuffle.partitions, 10)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
   hc.getAllConfs filter(_._1.contains(shuffle.partitions)) foreach
 println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/user/hive/warehouse)
 (spark.sql.shuffle.partitions,10)

 *You can see that I just permuted the two setConf call, then that leads to
 two different Hive configuration.*
 *It seems that HiveContext can not set a new value on
 hive.metastore.warehouse.dir key in one or the first setConf call.*
 *You need another setConf call before changing
 hive.metastore.warehouse.dir. For example, set
 hive.metastore.warehouse.dir twice and the snippet 1*

 snippet 3:

 
 ...
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
 *  hc.setConf(hive.metastore.warehouse.dir,
 /home/spark/hive/warehouse_test)*
   hc.getAllConfs filter(_._1.contains(warehouse.dir)) foreach println
 ...

 

 *Results:*
 (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)


 *You can reproduce this if you move to the latest branch-1.3
 (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*

 *I have also tested the released 1.3.0 (htag =
 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*

 *Please tell me if I am missing something. Any help is highly appreciated.*

 Hao

 --
 Hao Ren

 {Data, Software} Engineer @ ClaraVista

 Paris, France