Hi, I'm trying to run SQL query on Hive table which is stored on HBase. I'm using: - Spark 1.6.0 - HDP 2.2 - Hive 0.14.0 - HBase 0.98.4
I managed to configure working classpath, but I have following problems: 1) I have UDF defined in Hive Metastore (FUNCS table). Spark cannot use it.. File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o51.sql. : org.apache.spark.sql.AnalysisException: undefined function dwh.str_to_map_int_str; line 55 pos 30 at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:69) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:69) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:68) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:64) at scala.util.Try.getOrElse(Try.scala:77) at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:64) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:573) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:570) 2) When I'm using SQL without this function Spark tries to connect to Zookeeper on localhost. I make a tunnel from localhost to one of the zookeeper servers but it's not a solution. 16/01/28 10:09:18 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 16/01/28 10:09:18 INFO ZooKeeper: Client environment:host.name=j4.jupyter1 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.version=1.8.0_66 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-8-oracle/jre 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.class.path=/opt/spark/lib/mysql-connector-java-5.1.35-bin.jar:/opt/spark/lib/dwh-hbase-connector.jar:/opt/spark/lib/hive-hbase-handler-1.2.1.spark.jar:/opt/spark/lib/hbase-server.jar:/opt/spark/lib/hbase-common.jar:/opt/spark/lib/dwh-commons.jar:/opt/spark/lib/guava.jar:/opt/spark/lib/hbase-client.jar:/opt/spark/lib/hbase-protocol.jar:/opt/spark/lib/htrace-core.jar:/opt/spark/conf/:/opt/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark/lib/datanucleus-core-3.2.10.jar:/etc/hadoop/conf/ 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp 16/01/28 10:09:18 INFO ZooKeeper: Client environment:java.compiler=<NA> 16/01/28 10:09:18 INFO ZooKeeper: Client environment:os.name=Linux 16/01/28 10:09:18 INFO ZooKeeper: Client environment:os.arch=amd64 16/01/28 10:09:18 INFO ZooKeeper: Client environment:os.version=3.13.0-24-generic 16/01/28 10:09:18 INFO ZooKeeper: Client environment:user.name=mbrynski 16/01/28 10:09:18 INFO ZooKeeper: Client environment:user.home=/home/mbrynski 16/01/28 10:09:18 INFO ZooKeeper: Client environment:user.dir=/home/mbrynski 16/01/28 10:09:18 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x36079f06, quorum=localhost:2181, baseZNode=/hbase 16/01/28 10:09:18 INFO RecoverableZooKeeper: Process identifier=hconnection-0x36079f06 connecting to ZooKeeper ensemble=localhost:2181 16/01/28 10:09:18 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/01/28 10:09:18 INFO ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 16/01/28 10:09:18 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15254709ed3c8e1, negotiated timeout = 40000 16/01/28 10:09:18 INFO ZooKeeperRegistry: ClusterId read in ZooKeeper is null 3) After making tunel I'm getting NPE. Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:269) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:241) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:62) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1203) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1164) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201) ... 91 more Do you have any ideas how to resolve those problems ? Regards, -- Maciek Bryński