Varun Rao created SPARK-28523:
---------------------------------

             Summary: Cant Select from a partitioned table using the Hive 
Warehouse Connector
                 Key: SPARK-28523
                 URL: https://issues.apache.org/jira/browse/SPARK-28523
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Varun Rao


I'm having an issue SELECTING from a partition in a partitioned table using the 
Hive Warehouse Connector. Spark SQL is able to execute the SELECT statement 
without any problems. 

 

As a side note I dont see issues for the Hive Warehouse Connector on Spark's 
Jira. Can someone point me to this?

 

Below is the code that I ran to confirm this issue.

 

 

pyspark --jars 
/usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.21-1.jar
 --py-files 
/usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.21-1.zip 
--conf 
spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://c372-node2.squadron-labs.com:2181,c372-node3.squadron-labs.com:2181,c372-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive"
 --conf spark.hadoop.hive.llap.daemon.service.hosts=@llap0 --conf 
spark.hadoop.hive.zookeeper.quorum="c372-node2.squadron-labs.com:2181;c372-node3.squadron-labs.com:2181;c372-node4.squadron-labs.com:2181"
 

 

from pyspark_llap import HiveWarehouseSession

hive = HiveWarehouseSession.session(spark).build()

query = "CREATE TABLE test_table_2 (k BIGINT, s SMALLINT) PARTITIONED BY (d 
STRING)" 

spark.sql(query)

hive.executeUpdate(query)

query2 = "INSERT OVERWRITE TABLE test_table_2 PARTITION(d='partition_string') 
VALUES (666, 42)"

spark.sql(query2)

hive.executeUpdate(query2)

query3 = "SELECT * FROM test_table_2 WHERE d='partition_string'"

spark.sql(query3).show()

hive.executeQuery(query3).show()

 

RuntimeException: java.io.IOException: 
shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode 
= ConnectionLoss

at 
com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:66)

at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:109)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: 
shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode 
= ConnectionLoss

at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:619)

at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:388)

at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:55)

at 
org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:140)

at 
org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:136)

at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstanceForHost(LlapBaseInputFormat.java:391)

at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:373)

at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:156)

at 
com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.getRecordReader(HiveWarehouseDataReader.java:71)

at 
com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.<init>(HiveWarehouseDataReader.java:49)

at 
com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.getDataReader(HiveWarehouseDataReaderFactory.java:72)

at 
com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:64)

... 18 more

Caused by: shadecurator.org.apache.curator.CuratorConnectionLossException: 
KeeperErrorCode = ConnectionLoss

at 
shadecurator.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)

at 
shadecurator.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)

at 
shadecurator.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)

at 
shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489)

at 
shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)

at 
shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)

at shadecurator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)

at 
shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)

at 
shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)

at 
shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)

at 
shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)

at 
shadecurator.org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)

at 
shadecurator.org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)

at 
shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)

at 
shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:326)

at 
shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:303)

at 
org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:597)

... 29 more



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to