Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Artemis User
As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's 
saveAsTable method is the way to go.   JDBC Driver is for a JDBC client 
(a Java client for instance) to access the Hive tables in Spark via the 
Thrift server interface.


-- ND

On 7/19/21 2:42 AM, Badrinath Patchikolla wrote:

I have trying to create table in hive from spark itself,

And using local mode it will work what I am trying here is from spark 
standalone I want to create the manage table in hive (another spark 
cluster basically CDH) using jdbc mode.


When I try that below are the error I am facing.

On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, 
mailto:mich.talebza...@gmail.com>> wrote:


Have you created that table in Hive or are you trying to create it
from Spark itself.

You Hive is local. In this case you don't need a JDBC connection.
Have you tried:

df2.write.mode("overwrite").saveAsTable(mydb.mytable)

HTH




**view my Linkedin profile


*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.



On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla
mailto:pbadrinath1...@gmail.com>> wrote:

Hi,

Trying to write data in spark to the hive as JDBC mode below 
is the sample code:

spark standalone 2.4.7 version

21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java
classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For
SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040

Spark context available as 'sc' (master =
spark://localhost:7077, app id = app-20210715080414-0817).
Spark session available as 'spark'.
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java
1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

val df = Seq(
    ("John", "Smith", "London"),
    ("David", "Jones", "India"),
    ("Michael", "Johnson", "Indonesia"),
    ("Chris", "Lee", "Brazil"),
    ("Mike", "Brown", "Russia")
  ).toDF("first_name", "last_name", "country")


 df.write
  .format("jdbc")
  .option("url",
"jdbc:hive2://localhost:1/foundation;AuthMech=2;UseNativeQuery=0")
  .option("dbtable", "test.test")
  .option("user", "admin")
  .option("password", "admin")
  .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
  .mode("overwrite")
  .save


// Exiting paste mode, now interpreting.

java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051)
ERROR processing query/statement. Error Code: 4, SQL
state: TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:39
cannot recognize input near '"first_name"' 'TEXT' ',' in
column name or primary key or foreign key:28:27,

org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,

org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,

org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,

org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,

org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
sun.reflect.GeneratedMethodAccessor13:invoke::-1,

sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
java.lang.reflect.Method:invoke:Method.java:498,

org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,

org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,

org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
java.security.AccessController:doPrivileged:AccessController.java:-2,
 

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Badrinath Patchikolla
I have trying to create table in hive from spark itself,

And using local mode it will work what I am trying here is from spark
standalone I want to create the manage table in hive (another spark cluster
basically CDH) using jdbc mode.

When I try that below are the error I am facing.

On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, 
wrote:

> Have you created that table in Hive or are you trying to create it from
> Spark itself.
>
> You Hive is local. In this case you don't need a JDBC connection. Have you
> tried:
>
> df2.write.mode("overwrite").saveAsTable(mydb.mytable)
>
> HTH
>
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla <
> pbadrinath1...@gmail.com> wrote:
>
>> Hi,
>>
>> Trying to write data in spark to the hive as JDBC mode below  is the
>> sample code:
>>
>> spark standalone 2.4.7 version
>>
>> 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>> Spark context Web UI available at http://localhost:4040
>> Spark context available as 'sc' (master = spark://localhost:7077, app id
>> = app-20210715080414-0817).
>> Spark session available as 'spark'.
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>>   /_/
>>
>> Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala> :paste
>> // Entering paste mode (ctrl-D to finish)
>>
>> val df = Seq(
>> ("John", "Smith", "London"),
>> ("David", "Jones", "India"),
>> ("Michael", "Johnson", "Indonesia"),
>> ("Chris", "Lee", "Brazil"),
>> ("Mike", "Brown", "Russia")
>>   ).toDF("first_name", "last_name", "country")
>>
>>
>>  df.write
>>   .format("jdbc")
>>   .option("url",
>> "jdbc:hive2://localhost:1/foundation;AuthMech=2;UseNativeQuery=0")
>>   .option("dbtable", "test.test")
>>   .option("user", "admin")
>>   .option("password", "admin")
>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>>   .mode("overwrite")
>>   .save
>>
>>
>> // Exiting paste mode, now interpreting.
>>
>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
>> processing query/statement. Error Code: 4, SQL state:
>> TStatus(statusCode:ERROR_STATUS,
>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>> foreign key:28:27,
>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>> sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>> java.lang.reflect.Method:invoke:Method.java:498,
>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>> java.security.AccessController:doPrivileged:AccessController.java:-2,
>> javax.security.auth.Subject:doAs:Subject.java:422,
>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>> com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>> 

import yaml fails with docker or kubernetes but works ok when run wiyh YARN

2021-07-19 Thread Mich Talebzadeh
Hi,


My environment is set up OK with packages PySpark needs including

PyYAML version 5.4.1


In YARN or local mode a simple skeleton test I have setup picks up yaml.
However with docker image or when the image used inside kubernetes it fails


This is the code used to test


import sys
import os
def main():
print("\n Printing os stuff")
p=sys.path
print("\n Printing p")
print(p)
user_paths = os.environ['PYTHONPATH'].split(os.pathsep)
print("\n Printing user_paths")
print(user_paths)
print("checking yaml")
import yaml
spark_context.stop()

if __name__ == "__main__":
  main()


Checks the OS path and tries to import yaml


With k8 I get


spark-submit --verbose \
   --master k8s://$K8S_SERVER \
   --conf
"spark.yarn.dist.archives"=hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv}
\
   --deploy-mode cluster \
   --name pytest \
   --conf spark.kubernetes.namespace=spark \
   --conf spark.executor.instances=1 \
   --conf spark.kubernetes.driver.limit.cores=1 \
   --conf spark.executor.cores=1 \
   --conf spark.executor.memory=500m \
   --conf
spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1 \
   --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount
\
   --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \
   hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${APPLICATION}



+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
"$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=172.17.0.9 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class
org.apache.spark.deploy.PythonRunner hdfs://
50.140.197.220:9000/minikube/codes/testyml.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-07-19 10:20:41,430 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable


 Printing p
['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d',
'/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip',
'/opt/spark/python/lib/pyspark.zip',
'/opt/spark/python/lib/py4j-0.10.9-src.zip',
'/opt/spark/jars/spark-core_2.12-3.1.1.jar', '/usr/lib/python37.zip',
'/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload',
'/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']

 Printing user_paths
['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip',
'/opt/spark/python/lib/pyspark.zip',
'/opt/spark/python/lib/py4j-0.10.9-src.zip',
'/opt/spark/jars/spark-core_2.12-3.1.1.jar']
checking yaml
Traceback (most recent call last):
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line
17, in 
main()
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line
13, in main
import yaml
ModuleNotFoundError: No module named 'yaml'


Well yaml is a bit of an issue so I was wondering if anyone has seen this
before?


Thanks


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Unsubscribe

2021-07-19 Thread twinkle



Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Mich Talebzadeh
Your Driver seems to be OK.

 hive_driver: com.cloudera.hive.jdbc41.HS2Driver

However this is theSQL error you are getting

Caused by: com.cloudera.hiveserver2.support.exceptions.GeneralException:
[Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error
Code: 4, SQL state: TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
compiling statement: FAILED: ParseException line 1:39 cannot recognize
input near '"first_name"' 'TEXT' ',' in column name or primary key or
foreign key:28


Are you using a reserved word for table columns? What is your DDL for this
table?


HTH



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 19 Jul 2021 at 07:42, Badrinath Patchikolla <
pbadrinath1...@gmail.com> wrote:

> I have trying to create table in hive from spark itself,
>
> And using local mode it will work what I am trying here is from spark
> standalone I want to create the manage table in hive (another spark cluster
> basically CDH) using jdbc mode.
>
> When I try that below are the error I am facing.
>
> On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, 
> wrote:
>
>> Have you created that table in Hive or are you trying to create it from
>> Spark itself.
>>
>> You Hive is local. In this case you don't need a JDBC connection. Have
>> you tried:
>>
>> df2.write.mode("overwrite").saveAsTable(mydb.mytable)
>>
>> HTH
>>
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla <
>> pbadrinath1...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Trying to write data in spark to the hive as JDBC mode below  is the
>>> sample code:
>>>
>>> spark standalone 2.4.7 version
>>>
>>> 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>> Spark context Web UI available at http://localhost:4040
>>> Spark context available as 'sc' (master = spark://localhost:7077, app id
>>> = app-20210715080414-0817).
>>> Spark session available as 'spark'.
>>> Welcome to
>>>     __
>>>  / __/__  ___ _/ /__
>>> _\ \/ _ \/ _ `/ __/  '_/
>>>/___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>>>   /_/
>>>
>>> Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
>>> Type in expressions to have them evaluated.
>>> Type :help for more information.
>>>
>>> scala> :paste
>>> // Entering paste mode (ctrl-D to finish)
>>>
>>> val df = Seq(
>>> ("John", "Smith", "London"),
>>> ("David", "Jones", "India"),
>>> ("Michael", "Johnson", "Indonesia"),
>>> ("Chris", "Lee", "Brazil"),
>>> ("Mike", "Brown", "Russia")
>>>   ).toDF("first_name", "last_name", "country")
>>>
>>>
>>>  df.write
>>>   .format("jdbc")
>>>   .option("url",
>>> "jdbc:hive2://localhost:1/foundation;AuthMech=2;UseNativeQuery=0")
>>>   .option("dbtable", "test.test")
>>>   .option("user", "admin")
>>>   .option("password", "admin")
>>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>>>   .mode("overwrite")
>>>   .save
>>>
>>>
>>> // Exiting paste mode, now interpreting.
>>>
>>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
>>> processing query/statement. Error Code: 4, SQL state:
>>> TStatus(statusCode:ERROR_STATUS,
>>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>> foreign key:28:27,
>>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>>>