Hello All,

In Apache Phoenix homepage,  It shows two additional functions: Apache Spark 
Integration and Phoenix Storage Handler for Apache Hive,
According the guidance, I can query phoenix table from beeline-cli, I can load 
phoenix table as dataframe using Spark-sql.
So my question is :

Does Phoenix support spark-sql query the hive external table mapped from 
Phoenix ?

I am working on hdp3.0 ( Phoenix 5.0 Hbase 2.0, Hive 3.1.0 ,Spark2.3.1  )  and 
facing the issue as subject mentioned.
I tried to solve this problem but failed, I found some similar questions on 
internet but the answers didn’t work for me.

My submit command :

  spark-submit test3.py --jars \
  /usr/hdp/current/phoenix-client/lib/phoenix-hive-5.0.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/phoenix-client/lib/hadoop-mapreduce-client-core.jar\
  ,/usr/hdp/current/phoenix-client/lib/phoenix-core-5.0.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/phoenix-client/lib/phoenix-spark-5.0.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hive-metastore-3.1.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hive-common-3.1.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hbase-client-2.0.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hbase-mapreduce-2.0.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hive-serde-3.1.0.3.0.0.0-1634.jar\
  ,/usr/hdp/current/hive-client/lib/hive-shims-3.1.0.3.0.0.0-1634.jar

Log attached and Demo code as below:

  from pyspark.sql import SparkSession
  if __name__ == '__main__':
      spark = SparkSession.builder \
          .appName("test") \
          .enableHiveSupport() \
          .getOrCreate()
      df= spark.sql("select count(*) from ajmide_dw.part_device")
      df.show()


Similar Issues:
https://community.hortonworks.com/questions/140097/facing-issue-from-spark-sql.html
https://stackoverflow.com/questions/51501044/unable-to-access-hive-external-tables-from-spark-shell

Any comment or suggestion is appreciated!

Thanks,
Shi-Cheng, Ma

Reply via email to