Hello All,
In Apache Phoenix homepage, It shows two additional functions: Apache Spark
Integration and Phoenix Storage Handler for Apache Hive,
According the guidance, I can query phoenix table from beeline-cli, I can load
phoenix table as dataframe using Spark-sql.
So my question is :
Does Phoenix support spark-sql query the hive external table mapped from
Phoenix ?
I am working on hdp3.0 ( Phoenix 5.0 Hbase 2.0, Hive 3.1.0 ,Spark2.3.1 ) and
facing the issue as subject mentioned.
I tried to solve this problem but failed, I found some similar questions on
internet but the answers didn’t work for me.
My submit command :
spark-submit test3.py --jars \
/usr/hdp/current/phoenix-client/lib/phoenix-hive-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/phoenix-client/lib/hadoop-mapreduce-client-core.jar\
,/usr/hdp/current/phoenix-client/lib/phoenix-core-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/phoenix-client/lib/phoenix-spark-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-metastore-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-common-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hbase-client-2.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hbase-mapreduce-2.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-serde-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-shims-3.1.0.3.0.0.0-1634.jar
Log attached and Demo code as below:
from pyspark.sql import SparkSession
if __name__ == '__main__':
spark = SparkSession.builder \
.appName("test") \
.enableHiveSupport() \
.getOrCreate()
df= spark.sql("select count(*) from ajmide_dw.part_device")
df.show()
Similar Issues:
https://community.hortonworks.com/questions/140097/facing-issue-from-spark-sql.html
https://stackoverflow.com/questions/51501044/unable-to-access-hive-external-tables-from-spark-shell
Any comment or suggestion is appreciated!
Thanks,
Shi-Cheng, Ma