GitHub user jankogasic reopened a discussion: Connecting PySpark with Hive tables
Hello, I am trying to use pyspark to access kyuubi and Spark. The issue I have is with Hive dialect. The queries that end up on cluster have some weird syntax and they fail because of that. STEPS TO REPRODUCE: - sudo apt install -y openjdk-17-jdk - export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 - export PATH="$JAVA_HOME/bin:$PATH" - pip install pyspark 'pyspark[sql]' 'pyspark[pandas_on_spark]' - sudo apt install -y krb5-user - sudo cp krb5.conf /etc/ - kinit -kt janko-gasic.keytab [email protected] ``` import pyspark print(pyspark.__file__) print(pyspark.__version__) from pyspark.sql import SparkSession DRIVER_JAR = "./kyuubi-hive-jdbc-shaded-1.10.2.jar" DIALECT_JAR = "./kyuubi-extension-spark-jdbc-dialect_2.12-1.10.2.jar" spark = ( SparkSession.builder .appName("KyuubiJDBC") .config("spark.jars", f"{DRIVER_JAR},{DIALECT_JAR}") # make sure the driver also sees them .config("spark.driver.extraClassPath", f"{DRIVER_JAR}:{DIALECT_JAR}") # register the dialect (this is what enables ARRAY/MAP/STRUCT over JDBC) .config("spark.sql.extensions", "org.apache.spark.sql.dialect.KyuubiSparkJdbcDialectExtension") .getOrCreate() ) jdbc_url = ( "jdbc:kyuubi://spark1.lan.bla1.us:2181,spark2.lan.bla1.us:2181," ";serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi" ";auth=KERBEROS;principal=kyuubi/[email protected];ssl=true" ) try: df = ( spark.read.format("jdbc") .option("driver", "org.apache.kyuubi.jdbc.KyuubiHiveDriver") .option("url", jdbc_url) .option("query", "select 1") # <--------------------------------------------------- simple query .load() ) except Exception as e: print(e) finally: spark.stop() df.printSchema() df.show(5) ``` My error is ``` Py4JJavaError: An error occurred while calling o123.showString. : java.lang.NullPointerException: Cannot invoke "org.apache.spark.SparkEnv.rpcEnv()" because the return value of "org.apache.spark.SparkEnv$.get()" is null ``` pyspark 4.0.1 (aslo tried 3.5 but no luck) kyuubi 1.10.2 GitHub link: https://github.com/apache/kyuubi/discussions/7240 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
