Thank you Mich Talebzadeh for your answer. It's good to know that VARCHAR
and CHAR are properly showing in Spark 3. Do you know what changed in Spark
3 that made this possible?. Or how can I achieve the same output in Spark
2.4.1? If there are some conf options, that would be helpful.

Thanks,
Venkat
2016173438


On Fri, Mar 11, 2022 at 2:06 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hive 3.1.1
> Spark 3.1.1
>
> Your stack overflow issue raised and I quote:
>
> "I have a need to generate DDL statements for Hive tables & views
> programmatically. I tried using Spark and Beeline for this task. Beeline
> takes around 5-10 seconds for each of the statements whereas Spark
> completes the same thing in a few milliseconds. I am planning to use Spark
> since it is faster compared to beeline. One downside of using spark for
> getting DDL statements from the hive is, it treats CHAR, VARCHAR characters
> as String and it doesn't preserve the length information that goes with
> CHAR,VARCHAR data types. At the same time beeline preserves the data type
> and the length information for CHAR,VARCHAR data types. *I am using Spark
> 2.4.1 and Beeline 2.1.1.*
>
> Given below the sample create table command and its show create table
> output."
>
> Create a simple table in *Hive* in test database
>
> hive> *use test;*
> OK
> hive> *create table etc(ID BIGINT, col1 VARCHAR(30), col2 STRING);*
> OK
> hive> *desc formatted etc;*
> # col_name              data_type               comment
> *id                      bigint*
> *col1                    varchar(30)*
> *col2                    string*
>
> # Detailed Table Information
> Database:               test
> OwnerType:              USER
> Owner:                  hduser
> CreateTime:             Fri Mar 11 18:29:34 GMT 2022
> LastAccessTime:         UNKNOWN
> Retention:              0
> Location:               hdfs://rhes75:9000/user/hive/warehouse/test.db/etc
> Table Type:             MANAGED_TABLE
> Table Parameters:
>         COLUMN_STATS_ACCURATE
>  
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"id\":\"true\"}}
>         bucketing_version       2
>         numFiles                0
>         numRows                 0
>         rawDataSize             0
>         totalSize               0
>         transient_lastDdlTime   1647023374
>
> # Storage Information
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:            org.apache.hadoop.mapred.TextInputFormat
> OutputFormat:
>  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed:             No
> Num Buckets:            -1
> Bucket Columns:         []
> Sort Columns:           []
> Storage Desc Params:
>         serialization.format    1
>
> Now let's go to spark-shell
>                                               ^
> scala> *spark.sql("show create table test.etc").show(false)*
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |createtab_stmt
>
>                               |
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |CREATE TABLE `test`.`etc` (
> *  `id` BIGINT,*
> *  `col1` VARCHAR(30),*
> *  `col2` STRING)*
> USING text
> TBLPROPERTIES (
>   'bucketing_version' = '2',
>   'transient_lastDdlTime' = '1647023374')
> |
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> You can see Spark shows columns correctly
>
> Now let us go and create the same table in hive through beeline
>
>
> 0: jdbc:hive2://rhes75:10099/default>* use test*
>
> No rows affected (0.019 seconds)
>
> 0: jdbc:hive2://rhes75:10099/default> *create table etc(ID BIGINT, col1
> VARCHAR(30), col2 STRING)*
>
> . . . . . . . . . . . . . . . . . . > No rows affected (0.304 seconds)
>
> 0: jdbc:hive2://rhes75:10099/default> *desc formatted etc*
>
> . . . . . . . . . . . . . . . . . . >
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+
>
> |           col_name            |                     data_type
>           |                      comment                       |
>
>
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+
>
> | # col_name                    | data_type
>           | comment                                            |
>
> *| id                            | bigint
>            |                                                    |*
>
> *| col1                          | varchar(30)
>             |                                                    |*
>
> *| col2                          | string
>            |                                                    |*
>
> |                               | NULL
>            | NULL                                               |
>
> | # Detailed Table Information  | NULL
>            | NULL                                               |
>
> | Database:                     | test
>            | NULL                                               |
>
> | OwnerType:                    | USER
>            | NULL                                               |
>
> | Owner:                        | hduser
>            | NULL                                               |
>
> | CreateTime:                   | Fri Mar 11 18:51:00 GMT 2022
>            | NULL                                               |
>
> | LastAccessTime:               | UNKNOWN
>           | NULL                                               |
>
> | Retention:                    | 0
>           | NULL                                               |
>
> | Location:                     |
> hdfs://rhes75:9000/user/hive/warehouse/test.db/etc | NULL
>                              |
>
> | Table Type:                   | MANAGED_TABLE
>           | NULL                                               |
>
> | Table Parameters:             | NULL
>            | NULL                                               |
>
> |                               | COLUMN_STATS_ACCURATE
>           |
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"id\":\"true\"}}
> |
>
> |                               | bucketing_version
>           | 2                                                  |
>
> |                               | numFiles
>            | 0                                                  |
>
> |                               | numRows
>           | 0                                                  |
>
> |                               | rawDataSize
>           | 0                                                  |
>
> |                               | totalSize
>           | 0                                                  |
>
> |                               | transient_lastDdlTime
>           | 1647024660                                         |
>
> |                               | NULL
>            | NULL                                               |
>
> | # Storage Information         | NULL
>            | NULL                                               |
>
> | SerDe Library:                |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
>                              |
>
> | InputFormat:                  |
> org.apache.hadoop.mapred.TextInputFormat           | NULL
>                              |
>
> | OutputFormat:                 |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
>                                      |
>
> | Compressed:                   | No
>            | NULL                                               |
>
> | Num Buckets:                  | -1
>            | NULL                                               |
>
> | Bucket Columns:               | []
>            | NULL                                               |
>
> | Sort Columns:                 | []
>            | NULL                                               |
>
> | Storage Desc Params:          | NULL
>            | NULL                                               |
>
> |                               | serialization.format
>            | 1                                                  |
>
>
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+
>
> 33 rows selected (0.159 seconds)
>
> Now check that in spark-shell again
>
>
> scala> spark.sql("show create table test.etc").show(false)
>
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> |createtab_stmt
>
>                               |
>
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> |CREATE TABLE `test`.`etc` (
>
> *  `id` BIGINT,*
>
> *  `col1` VARCHAR(30),*
>
> *  `col2` STRING)*
>
> USING text
>
> TBLPROPERTIES (
>
>   'bucketing_version' = '2',
>
>   'transient_lastDdlTime' = '1647024660')
>
> |
>
>
> +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
>
> It shows OK.  Soo in summary you get column definitions in Spark as you
> have defined them in Hive
>
>
> In your statement above and I quote "I am using Spark 2.4.1 and Beeline
> 2.1.1", refers to older versions of Spark and hive which may have had
> such issues.
>
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 11 Mar 2022 at 18:19, Venkatesan Muniappan <
> m.venkatbe...@gmail.com> wrote:
>
>> hi Spark Team,
>>
>> I have raised a question on Spark through Stackoverflow. When you get a
>> chance, can you please take a look and help me ?.
>>
>> https://stackoverflow.com/q/71431757/5927843
>>
>> Thanks,
>> Venkat
>> 2016173438
>>
>

Reply via email to