Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Venkatesan Muniappan
Thanks for the clarification. I will try to do plain jdbc connection on
Scala/Java and will update this thread on how it goes.

*Thanks,*
*Venkat*



On Thu, Dec 7, 2023 at 9:40 AM Nicholas Chammas 
wrote:

> PyMySQL has its own implementation
> <https://github.com/PyMySQL/PyMySQL/blob/f13f054abcc18b39855a760a84be0a517f0da658/pymysql/protocol.py>
>  of
> the MySQL client-server protocol. It does not use JDBC.
>
>
> On Dec 6, 2023, at 10:43 PM, Venkatesan Muniappan <
> venkatesa...@noonacademy.com> wrote:
>
> Thanks for the advice Nicholas.
>
> As mentioned in the original email, I have tried JDBC + SSH Tunnel using
> pymysql and sshtunnel and it worked fine. The problem happens only with
> Spark.
>
> *Thanks,*
> *Venkat*
>
>
>
> On Wed, Dec 6, 2023 at 10:21 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> This is not a question for the dev list. Moving dev to bcc.
>>
>> One thing I would try is to connect to this database using JDBC + SSH
>> tunnel, but without Spark. That way you can focus on getting the JDBC
>> connection to work without Spark complicating the picture for you.
>>
>>
>> On Dec 5, 2023, at 8:12 PM, Venkatesan Muniappan <
>> venkatesa...@noonacademy.com> wrote:
>>
>> Hi Team,
>>
>> I am facing an issue with SSH Tunneling in Apache Spark. The behavior is
>> same as the one in this Stackoverflow question
>> <https://stackoverflow.com/questions/68278369/how-to-use-pyspark-to-read-a-mysql-database-using-a-ssh-tunnel>
>> but there are no answers there.
>>
>> This is what I am trying:
>>
>>
>> with SSHTunnelForwarder(
>> (ssh_host, ssh_port),
>> ssh_username=ssh_user,
>> ssh_pkey=ssh_key_file,
>> remote_bind_address=(sql_hostname, sql_port),
>> local_bind_address=(local_host_ip_address, sql_port)) as tunnel:
>> tunnel.local_bind_port
>> b1_semester_df = spark.read \
>> .format("jdbc") \
>> .option("url", b2b_mysql_url.replace("<>", 
>> str(tunnel.local_bind_port)))
>> \
>> .option("query", b1_semester_sql) \
>> .option("database", 'b2b') \
>> .option("password", b2b_mysql_password) \
>> .option("driver", "com.mysql.cj.jdbc.Driver") \
>> .load()
>> b1_semester_df.count()
>>
>> Here, the b1_semester_df is loaded but when I try count on the same Df it
>> fails saying this
>>
>> 23/12/05 11:49:17 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4
>> times; aborting job
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 382, in
>> show
>> print(self._jdf.showString(n, 20, vertical))
>>   File
>> "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line
>> 1257, in __call__
>>   File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
>> return f(*a, **kw)
>>   File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
>> line 328, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling
>> o284.showString.
>> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 2.0 (TID 11, ip-172-32-108-1.eu-central-1.compute.internal, executor 3):
>> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link
>> failure
>>
>> However, the same is working fine with pandas df. I have tried this below
>> and it worked.
>>
>>
>> with SSHTunnelForwarder(
>> (ssh_host, ssh_port),
>> ssh_username=ssh_user,
>> ssh_pkey=ssh_key_file,
>> remote_bind_address=(sql_hostname, sql_port)) as tunnel:
>> conn = pymysql.connect(host=local_host_ip_address, user=sql_username,
>> passwd=sql_password, db=sql_main_database,
>> port=tunnel.local_bind_port)
>> df = pd.read_sql_query(b1_semester_sql, conn)
>> spark.createDataFrame(df).createOrReplaceTempView("b1_semester")
>>
>> So wanted to check what I am missing with my Spark usage. Please help.
>>
>> *Thanks,*
>> *Venkat*
>>
>>
>>
>


Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Venkatesan Muniappan
Thanks for the advice Nicholas.

As mentioned in the original email, I have tried JDBC + SSH Tunnel using
pymysql and sshtunnel and it worked fine. The problem happens only with
Spark.

*Thanks,*
*Venkat*



On Wed, Dec 6, 2023 at 10:21 PM Nicholas Chammas 
wrote:

> This is not a question for the dev list. Moving dev to bcc.
>
> One thing I would try is to connect to this database using JDBC + SSH
> tunnel, but without Spark. That way you can focus on getting the JDBC
> connection to work without Spark complicating the picture for you.
>
>
> On Dec 5, 2023, at 8:12 PM, Venkatesan Muniappan <
> venkatesa...@noonacademy.com> wrote:
>
> Hi Team,
>
> I am facing an issue with SSH Tunneling in Apache Spark. The behavior is
> same as the one in this Stackoverflow question
> <https://stackoverflow.com/questions/68278369/how-to-use-pyspark-to-read-a-mysql-database-using-a-ssh-tunnel>
> but there are no answers there.
>
> This is what I am trying:
>
>
> with SSHTunnelForwarder(
> (ssh_host, ssh_port),
> ssh_username=ssh_user,
> ssh_pkey=ssh_key_file,
> remote_bind_address=(sql_hostname, sql_port),
> local_bind_address=(local_host_ip_address, sql_port)) as tunnel:
> tunnel.local_bind_port
> b1_semester_df = spark.read \
> .format("jdbc") \
> .option("url", b2b_mysql_url.replace("<>", 
> str(tunnel.local_bind_port)))
> \
> .option("query", b1_semester_sql) \
> .option("database", 'b2b') \
> .option("password", b2b_mysql_password) \
> .option("driver", "com.mysql.cj.jdbc.Driver") \
> .load()
> b1_semester_df.count()
>
> Here, the b1_semester_df is loaded but when I try count on the same Df it
> fails saying this
>
> 23/12/05 11:49:17 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4
> times; aborting job
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 382, in show
> print(self._jdf.showString(n, 20, vertical))
>   File
> "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line
> 1257, in __call__
>   File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
> line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o284.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 11, ip-172-32-108-1.eu-central-1.compute.internal, executor 3):
> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link
> failure
>
> However, the same is working fine with pandas df. I have tried this below
> and it worked.
>
>
> with SSHTunnelForwarder(
> (ssh_host, ssh_port),
> ssh_username=ssh_user,
> ssh_pkey=ssh_key_file,
> remote_bind_address=(sql_hostname, sql_port)) as tunnel:
> conn = pymysql.connect(host=local_host_ip_address, user=sql_username,
> passwd=sql_password, db=sql_main_database,
> port=tunnel.local_bind_port)
> df = pd.read_sql_query(b1_semester_sql, conn)
> spark.createDataFrame(df).createOrReplaceTempView("b1_semester")
>
> So wanted to check what I am missing with my Spark usage. Please help.
>
> *Thanks,*
> *Venkat*
>
>
>


SSH Tunneling issue with Apache Spark

2023-12-05 Thread Venkatesan Muniappan
Hi Team,

I am facing an issue with SSH Tunneling in Apache Spark. The behavior is
same as the one in this Stackoverflow question

but there are no answers there.

This is what I am trying:


with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=ssh_key_file,
remote_bind_address=(sql_hostname, sql_port),
local_bind_address=(local_host_ip_address, sql_port)) as tunnel:
tunnel.local_bind_port
b1_semester_df = spark.read \
.format("jdbc") \
.option("url", b2b_mysql_url.replace("<>",
str(tunnel.local_bind_port)))
\
.option("query", b1_semester_sql) \
.option("database", 'b2b') \
.option("password", b2b_mysql_password) \
.option("driver", "com.mysql.cj.jdbc.Driver") \
.load()
b1_semester_df.count()

Here, the b1_semester_df is loaded but when I try count on the same Df it
fails saying this

23/12/05 11:49:17 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times;
aborting job

Traceback (most recent call last):

  File "", line 1, in 

  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 382, in show

print(self._jdf.showString(n, 20, vertical))

  File
"/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line
1257, in __call__

  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco

return f(*a, **kw)

  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
line 328, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling
o284.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 11, ip-172-32-108-1.eu-central-1.compute.internal, executor 3):
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link
failure

However, the same is working fine with pandas df. I have tried this below
and it worked.


with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=ssh_key_file,
remote_bind_address=(sql_hostname, sql_port)) as tunnel:
conn = pymysql.connect(host=local_host_ip_address, user=sql_username,
passwd=sql_password, db=sql_main_database,
port=tunnel.local_bind_port)
df = pd.read_sql_query(b1_semester_sql, conn)
spark.createDataFrame(df).createOrReplaceTempView("b1_semester")

So wanted to check what I am missing with my Spark usage. Please help.

*Thanks,*
*Venkat*


How Spark establishes connectivity to Hive

2022-03-14 Thread Venkatesan Muniappan
hi Team,

I wanted to understand how spark connects to Hive. Does it connect to Hive
metastore directly bypassing hive server?. Lets say when we are inserting
data into a hive table with its I/O format as Parquet. Does Spark creates
the parquet file from the Dataframe/RDD/DataSet and put it in its HDFS
location and update metastore about the new parquet file?. Or it simply run
the insert statement on Hiverserver (through jdbc or some other means).

We are using Spark 2.4.3 and Hive 2.1.1 in our cluster.

Is there a document that explains about this?. Please share.

Thanks,
Venkat
2016173438


Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-12 Thread Venkatesan Muniappan
hi,
Does anybody else have a better suggestion for my problem?.

Thanks,
Venkat
2016173438


On Fri, Mar 11, 2022 at 4:43 PM Venkatesan Muniappan <
m.venkatbe...@gmail.com> wrote:

> ok. I work for an org where such upgrades take a few months. Not an
> immediate task.
>
> Thanks,
> Venkat
> 2016173438
>
>
> On Fri, Mar 11, 2022 at 4:38 PM Mich Talebzadeh 
> wrote:
>
>> yes in spark 3.1.1. Best to upgrade it to spark 3+.
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 11 Mar 2022 at 21:35, Venkatesan Muniappan <
>> m.venkatbe...@gmail.com> wrote:
>>
>>> Thank you. I am trying to get the table definition for the existing
>>> tables. BTW, the create and show command that you executed, was it on Spark
>>> 3.x ? .
>>>
>>> Thanks,
>>> Venkat
>>> 2016173438
>>>
>>>
>>> On Fri, Mar 11, 2022 at 4:28 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Well I do not know what has changed. However, this should not affect
>>>> your work.
>>>>
>>>>
>>>> Try to create table in Spark
>>>>
>>>>
>>>> sqltext: String =
>>>>
>>>>   CREATE TABLE if not exists test.etcs(
>>>>
>>>>  ID INT
>>>>
>>>>, CLUSTERED INT
>>>>
>>>>, SCATTERED INT
>>>>
>>>>, RANDOMISED INT
>>>>
>>>>, RANDOM_STRING VARCHAR(50)
>>>>
>>>>, SMALL_VC VARCHAR(10)
>>>>
>>>>, PADDING  VARCHAR(4000)
>>>>
>>>>, PADDING2 STRING
>>>>
>>>>   )
>>>>
>>>>   CLUSTERED BY (ID) INTO 256 BUCKETS
>>>>
>>>>   STORED AS PARQUET
>>>>
>>>>   TBLPROPERTIES (
>>>>
>>>>   "parquet.compress"="SNAPPY"
>>>>
>>>>  )
>>>>
>>>>
>>>> scala> spark.sql (sqltext)
>>>>
>>>> scala> spark.sql("show create table test.etcs").show(false)
>>>>
>>>>
>>>> ++
>>>>
>>>> |createtab_stmt
>>>>
>>>>
>>>>
>>>>   |
>>>>
>>>>
>>>> ++
>>>>
>>>> |CREATE TABLE `test`.`etcs` (
>>>>
>>>>   `ID` INT,
>>>>
>>>>   `CLUSTERED` INT,
>>>>
>>>>   `SCATTERED` INT,
>>>>
>>>>   `RANDOMISED` INT,
>>>>
>>>>   `RANDOM_STRING` VARCHAR(50),
>>>>
>>>>   `SMALL_VC` VARCHAR(10),
>>>>
>>>>   `PADDING` VARCHAR(4000),
>>>>
>>>>   `PADDING2` STRING)
>>>>
>>>> USING parquet
>>>>
>>>> CLUSTERED BY (ID)
>>>>
>>>> INTO 256 BUCKETS
>>>>
>>>> TBLPROPERTIES (
>>>>
>>>>   'transient_lastDdlTime' = '1647033659',
>>>>
>>>>   'parquet.compress' = 'SNAPPY')
>>>>
>>>> |
>>>>
>>>>
>>>> +--

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
ok. I work for an org where such upgrades take a few months. Not an
immediate task.

Thanks,
Venkat
2016173438


On Fri, Mar 11, 2022 at 4:38 PM Mich Talebzadeh 
wrote:

> yes in spark 3.1.1. Best to upgrade it to spark 3+.
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 11 Mar 2022 at 21:35, Venkatesan Muniappan <
> m.venkatbe...@gmail.com> wrote:
>
>> Thank you. I am trying to get the table definition for the existing
>> tables. BTW, the create and show command that you executed, was it on Spark
>> 3.x ? .
>>
>> Thanks,
>> Venkat
>> 2016173438
>>
>>
>> On Fri, Mar 11, 2022 at 4:28 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well I do not know what has changed. However, this should not affect
>>> your work.
>>>
>>>
>>> Try to create table in Spark
>>>
>>>
>>> sqltext: String =
>>>
>>>   CREATE TABLE if not exists test.etcs(
>>>
>>>  ID INT
>>>
>>>, CLUSTERED INT
>>>
>>>, SCATTERED INT
>>>
>>>, RANDOMISED INT
>>>
>>>, RANDOM_STRING VARCHAR(50)
>>>
>>>, SMALL_VC VARCHAR(10)
>>>
>>>, PADDING  VARCHAR(4000)
>>>
>>>, PADDING2 STRING
>>>
>>>   )
>>>
>>>   CLUSTERED BY (ID) INTO 256 BUCKETS
>>>
>>>   STORED AS PARQUET
>>>
>>>   TBLPROPERTIES (
>>>
>>>   "parquet.compress"="SNAPPY"
>>>
>>>  )
>>>
>>>
>>> scala> spark.sql (sqltext)
>>>
>>> scala> spark.sql("show create table test.etcs").show(false)
>>>
>>>
>>> ++
>>>
>>> |createtab_stmt
>>>
>>>
>>>
>>> |
>>>
>>>
>>> ++
>>>
>>> |CREATE TABLE `test`.`etcs` (
>>>
>>>   `ID` INT,
>>>
>>>   `CLUSTERED` INT,
>>>
>>>   `SCATTERED` INT,
>>>
>>>   `RANDOMISED` INT,
>>>
>>>   `RANDOM_STRING` VARCHAR(50),
>>>
>>>   `SMALL_VC` VARCHAR(10),
>>>
>>>   `PADDING` VARCHAR(4000),
>>>
>>>   `PADDING2` STRING)
>>>
>>> USING parquet
>>>
>>> CLUSTERED BY (ID)
>>>
>>> INTO 256 BUCKETS
>>>
>>> TBLPROPERTIES (
>>>
>>>   'transient_lastDdlTime' = '1647033659',
>>>
>>>   'parquet.compress' = 'SNAPPY')
>>>
>>> |
>>>
>>>
>>> +--
>>>
>>>
>>> Note that columns are OK.
>>>
>>>
>>> Also check this link for the differences between CHAR, VARCHAR and
>>> STRING types in Hive
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/hive/languagemanual+types
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>>   view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, dam

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
Thank you. I am trying to get the table definition for the existing tables.
BTW, the create and show command that you executed, was it on Spark 3.x ? .

Thanks,
Venkat
2016173438


On Fri, Mar 11, 2022 at 4:28 PM Mich Talebzadeh 
wrote:

> Well I do not know what has changed. However, this should not affect your
> work.
>
>
> Try to create table in Spark
>
>
> sqltext: String =
>
>   CREATE TABLE if not exists test.etcs(
>
>  ID INT
>
>, CLUSTERED INT
>
>, SCATTERED INT
>
>, RANDOMISED INT
>
>, RANDOM_STRING VARCHAR(50)
>
>, SMALL_VC VARCHAR(10)
>
>, PADDING  VARCHAR(4000)
>
>, PADDING2 STRING
>
>   )
>
>   CLUSTERED BY (ID) INTO 256 BUCKETS
>
>   STORED AS PARQUET
>
>   TBLPROPERTIES (
>
>   "parquet.compress"="SNAPPY"
>
>  )
>
>
> scala> spark.sql (sqltext)
>
> scala> spark.sql("show create table test.etcs").show(false)
>
>
> ++
>
> |createtab_stmt
>
>
>
>   |
>
>
> ++
>
> |CREATE TABLE `test`.`etcs` (
>
>   `ID` INT,
>
>   `CLUSTERED` INT,
>
>   `SCATTERED` INT,
>
>   `RANDOMISED` INT,
>
>   `RANDOM_STRING` VARCHAR(50),
>
>   `SMALL_VC` VARCHAR(10),
>
>   `PADDING` VARCHAR(4000),
>
>   `PADDING2` STRING)
>
> USING parquet
>
> CLUSTERED BY (ID)
>
> INTO 256 BUCKETS
>
> TBLPROPERTIES (
>
>   'transient_lastDdlTime' = '1647033659',
>
>   'parquet.compress' = 'SNAPPY')
>
> |
>
>
> +--
>
>
> Note that columns are OK.
>
>
> Also check this link for the differences between CHAR, VARCHAR and STRING
> types in Hive
>
>
> https://cwiki.apache.org/confluence/display/hive/languagemanual+types
>
>
> HTH
>
>
>
>   view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 11 Mar 2022 at 20:55, Venkatesan Muniappan <
> m.venkatbe...@gmail.com> wrote:
>
>> Thank you Mich Talebzadeh for your answer. It's good to know that VARCHAR
>> and CHAR are properly showing in Spark 3. Do you know what changed in Spark
>> 3 that made this possible?. Or how can I achieve the same output in Spark
>> 2.4.1? If there are some conf options, that would be helpful.
>>
>> Thanks,
>> Venkat
>> 2016173438
>>
>>
>> On Fri, Mar 11, 2022 at 2:06 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hive 3.1.1
>>> Spark 3.1.1
>>>
>>> Your stack overflow issue raised and I quote:
>>>
>>> "I have a need to generate DDL statements for Hive tables & views
>>> programmatically. I tried using Spark and Beeline for this task. Beeline
>>> takes around 5-10 seconds for each of the statements whereas Spark
>>> completes the same thing in a few milliseconds. I am planning to use Spark
>>> since it is faster compared to beeline. One downside of using spark for
>>> getting DDL statements from the hive is, it treats CHAR, VARCHAR characters
>>> as String and it doesn't preserve the length information that goes with
>>> CHAR,VARCHAR data types. At the same time beeline preserves the data type
>>> and the length information for CHAR,VARCHAR data types. *I am using
>>> Spark 2.4.1 and Beeline 2.1.1.*
>>>
>>> Given below the sample create table command and its show create table
>>> output.&

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
t;   |
>
>
> ++
>
> |CREATE TABLE `test`.`etc` (
>
> *  `id` BIGINT,*
>
> *  `col1` VARCHAR(30),*
>
> *  `col2` STRING)*
>
> USING text
>
> TBLPROPERTIES (
>
>   'bucketing_version' = '2',
>
>   'transient_lastDdlTime' = '1647024660')
>
> |
>
>
> +----+
>
>
> It shows OK.  Soo in summary you get column definitions in Spark as you
> have defined them in Hive
>
>
> In your statement above and I quote "I am using Spark 2.4.1 and Beeline
> 2.1.1", refers to older versions of Spark and hive which may have had
> such issues.
>
>
> HTH
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 11 Mar 2022 at 18:19, Venkatesan Muniappan <
> m.venkatbe...@gmail.com> wrote:
>
>> hi Spark Team,
>>
>> I have raised a question on Spark through Stackoverflow. When you get a
>> chance, can you please take a look and help me ?.
>>
>> https://stackoverflow.com/q/71431757/5927843
>>
>> Thanks,
>> Venkat
>> 2016173438
>>
>


Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
hi Spark Team,

I have raised a question on Spark through Stackoverflow. When you get a
chance, can you please take a look and help me ?.

https://stackoverflow.com/q/71431757/5927843

Thanks,
Venkat
2016173438