Re: How to pass a constant value to a partitioned hive table in spark

Mich Talebzadeh Sun, 19 Apr 2020 00:23:21 -0700

Many thanks Ayan.

I tried that as well as follows:


val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
val df = spark.read.
                format("com.databricks.spark.xml").
                option("rootTag", "hierarchy").
                option("rowTag", "sms_request").
                load("/tmp/broadcast.xml")

*val newDF = df.withColumn("broadcastId", lit(broadcastValue))*
So this column broadcastId  is a static partition in Hive table whereas the
other column brand is considered a dynamic partition

newDF.createOrReplaceTempView("tmp")
// Need to create and populate target Parquet table
michtest.BroadcastStaging
//
HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")

  var sqltext = """
  CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
     partyId STRING
   , phoneNumber STRING
  )
  PARTITIONED BY (
     broadcastId STRING
   , brand STRING
)
  STORED AS PARQUET
  """
  HiveContext.sql(sqltext)
  //
  // Put data in Hive table
  //
     // Dynamic partitioning is disabled by default. We turn it on
     //spark.sql("SET hive.exec.dynamic.partition = true")
     spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict")
     // spark.sql("SET hive.exec.max.dynamic.partitions.pernode = 400")

  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand)
  SELECT
          ocis_party_id AS partyId
        , target_mobile_no AS phoneNumber
        , brand
  FROM tmp
  """

org.apache.spark.sql.catalyst.parser.ParseException:
missing STRING at ','(line 2, pos 85)

== SQL ==

  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand)
-------------------------------------------------------------------------------------^^^
  SELECT
          ocis_party_id AS partyId
        , target_mobile_no AS phoneNumber
       , brand
  FROM tmp

The thing is that if I pass (broadcastId = "123456789", brand) it works
with no problem!

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 16 Apr 2020 at 13:05, ayan guha <guha.a...@gmail.com> wrote:

> Hi Mitch
>
> Add it in the DF first
>
> from pyspark.sql.functions import lit
>
> df = df.withColumn('broadcastId, lit(broadcastValue))
>
> Then you will be able to access the column in the temp view
>
> Re: Partitioning, DataFrame.write also supports partitionBy clause and you
> can use it along with saveAsTable.
>
>
> On Thu, Apr 16, 2020 at 9:47 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks Zhang,
>>
>> That is not working. I need to send the value for variable
>> broadcastValue, it cannot interpret it.
>>
>>  scala>   sqltext = """
>>      |   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> (broadcastId = broadcastValue, brand = "dummy")
>>      |   SELECT
>>      |           ocis_party_id AS partyId
>>      |         , target_mobile_no AS phoneNumber
>>      |   FROM tmp
>>      |   """
>> sqltext: String =
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> broadcastValue, brand = "dummy")
>>   SELECT
>>           ocis_party_id AS partyId
>>         , target_mobile_no AS phoneNumber
>>   FROM tmp
>>
>>
>> *scala>   spark.sql(sqltext)*
>> org.apache.spark.sql.catalyst.parser.ParseException:
>> missing STRING at ','(line 2, pos 85)
>>
>> == SQL ==
>>
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> broadcastValue, brand = "dummy")
>>
>> -------------------------------------------------------------------------------------^^^
>>   SELECT
>>           ocis_party_id AS partyId
>>         , target_mobile_no AS phoneNumber
>>   FROM tmp
>>
>>
>>   at
>> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
>>   at
>> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
>>   at
>> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>>   at
>> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
>>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
>>   ... 55 elided
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 16 Apr 2020 at 12:26, ZHANG Wei <wezh...@outlook.com> wrote:
>>
>>> > scala>   spark.sql($sqltext)
>>> > <console>:41: error: not found: value $sqltext
>>> >          spark.sql($sqltext)
>>>                      ^
>>>                      +-- should be Scala language
>>>
>>> Try this:
>>>
>>> scala> spark.sql(sqltext)
>>>
>>> --
>>> Cheers,
>>> -z
>>>
>>> On Thu, 16 Apr 2020 08:49:40 +0100
>>> Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
>>>
>>> > I have a variable to be passed to a column of partition as shown below
>>> >
>>> > *val broadcastValue = "123456789" * // I assume this will be sent as a
>>> > constant for the batch
>>> > // Create a DF on top of XML
>>> >
>>> > df.createOrReplaceTempView("tmp")
>>> > // Need to create and populate target Parquet table
>>> > michtest.BroadcastStaging
>>> > //
>>> > HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")
>>> >
>>> >   var sqltext = """
>>> >   CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
>>> >      partyId STRING
>>> >    , phoneNumber STRING
>>> >   )
>>> >   PARTITIONED BY (
>>> >      broadcastId STRING
>>> >    , brand STRING)
>>> >   STORED AS PARQUET
>>> >   """
>>> >   HiveContext.sql(sqltext)
>>> >
>>> > // Now insert the data from temp table
>>> >   //
>>> >   // Put data in Hive table
>>> >   //
>>> >      // Dynamic partitioning is disabled by default. We turn it on
>>> >      spark.sql("SET hive.exec.dynamic.partition = true")
>>> >      spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
>>> >
>>> >   sqltext = """
>>> >
>>> > *  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId
>>> =
>>> > $broadcastValue, brand = "dummy")*  SELECT
>>> >           ocis_party_id AS partyId
>>> >         , target_mobile_no AS phoneNumber
>>> >   FROM tmp
>>> >   """
>>> >   spark.sql($sqltext)
>>> >
>>> >
>>> > However, this does not work!
>>> >
>>> >
>>> > scala>   sqltext = """
>>> >      |   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>>> > (broadcastId = $broadcastValue, brand = "dummy")
>>> >      |   SELECT
>>> >      |           ocis_party_id AS partyId
>>> >      |         , target_mobile_no AS phoneNumber
>>> >      |   FROM tmp
>>> >      |   """
>>> > sqltext: String =
>>> >   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>>> > $broadcastValue, brand = "dummy")
>>> >   SELECT
>>> >           ocis_party_id AS partyId
>>> >         , target_mobile_no AS phoneNumber
>>> >   FROM tmp
>>> >
>>> >
>>> > scala>   spark.sql($sqltext)
>>> > <console>:41: error: not found: value $sqltext
>>> >          spark.sql($sqltext)
>>> >
>>> >
>>> > Any ideas?
>>> >
>>> >
>>> > Thanks
>>> >
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn *
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> > <
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> >*
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com
>>> >
>>> >
>>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any
>>> > loss, damage or destruction of data or any other property which may
>>> arise
>>> > from relying on this email's technical content is explicitly
>>> disclaimed.
>>> > The author will in no case be liable for any monetary damages arising
>>> from
>>> > such loss, damage or destruction.
>>>
>>
>
> --
> Best Regards,
> Ayan Guha
>

Re: How to pass a constant value to a partitioned hive table in spark

Reply via email to