Spark - Livy - Hive Table User
The tables created are in the name livy: drwxrwxrwx+ - livy hdfs 0 2017-12-18 09:38 /apps/hive/warehouse/dev.db/tbl 1. Df.write.saveAsTable() 2. spark.sql("CREATE TABLE tbl (key INT, value STRING)") whereas, the create table from hive shell is 'hive'/proxyuser: drwxrwxrwx+ - guest hdfs 0 2017-12-18 08:51 /apps/hive/warehouse/dev.db/tbll Is there a property via which it can be set for spark?
RE: Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12
To set Spark2 as default, refer https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html#default_tools -Original Message- From: Gaurav1809 [mailto:gauravhpan...@gmail.com] Sent: Wednesday, September 20, 2017 9:16 AM To: user@spark.apache.org Subject: Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12 Hello all, I downloaded CDH and it comes with Spark 1.6 As per the step by step guide given - I added Spark 2 in the services list. Now I can see both Spark 1.6 & Spark 2 And when I do Spark-Shell in terminal window, it starts with Spark 1.6 only. How to switch to Spark 2? What all _HOMEs or paramters I need to set up?(Or do I need to deleted the older service)? Any pointers towards this will be helpful. Thanks Gaurav -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
RE: SparkSession via HS2 - Error: Yarn application has already ended
While testing like this, it does not read hive-site.xml, spark-env.sh of the cluster (had to pass in SparkSession.builder().config()). Is there a way to make it read spark config present in the cluster? From: Sudha KS Sent: Wednesday, July 5, 2017 6:45 PM To: user@spark.apache.org Subject: RE: SparkSession via HS2 - Error: Yarn application has already ended For now, passing the config in SparkSession: SparkSession spark = SparkSession .builder() .enableHiveSupport() .master("yarn-client") .appName("SampleSparkUDTF_yarnV1") .config("spark.yarn.jars","hdfs:///hdp/apps/2.6.1.0-129/spark2") .config("spark.yarn.am.extraJavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.driver.extra.JavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.executor.memory","4g") .getOrCreate(); While testing via HS2 & this is the error: beeline -u jdbc:hive2://localhost:1 -d org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:1> …… Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:102) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more Is there a way to resolve this error? On Wed, Jul 5, 2017 at 2:01 PM, Sudha KS <sudha...@fuzzylogix.com<mailto:sudha...@fuzzylogix.com>> wrote: The property “spark.yarn.jars” available via /usr/hdp/current/spark2-client/conf/spark-default.conf spark.yarn.jars hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2<http://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2> Is there any other way to set/read/pass this property “spark.yarn.jars” ? From: Sudha KS [mailto:sudha...@fuzzylogix.com<mailto:sudha...@fuzzylogix.com>] Sent: Wednesday, July 5, 2017 1:51 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: SparkSession via HS2 - Error -spark.yarn.jars not read Why does “spark.yarn.jars” property not read, in this HDP 2.6 , Spark2.1.1 cluster: 0: jdbc:hive2://localhost:1/db> set spark.yarn.jars; +--+--+ | set | +--+--+ | spark.yarn.jars=hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2<http://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2> | +--+--+ 1 row selected (0.101 seconds) 0: jdbc:hive2://localhost:1/db> Error during launch of a SparkSession via HS2: Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499235958765_0042/container_e04_1499235958765_0042_01_05/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org
RE: SparkSession via HS2 - Error: Yarn application has already ended
For now, passing the config in SparkSession: SparkSession spark = SparkSession .builder() .enableHiveSupport() .master("yarn-client") .appName("SampleSparkUDTF_yarnV1") .config("spark.yarn.jars","hdfs:///hdp/apps/2.6.1.0-129/spark2") .config("spark.yarn.am.extraJavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.driver.extra.JavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.executor.memory","4g") .getOrCreate(); While testing via HS2 & this is the error: beeline -u jdbc:hive2://localhost:1 -d org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:1> …… Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:102) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more Is there a way to resolve this error? On Wed, Jul 5, 2017 at 2:01 PM, Sudha KS <sudha...@fuzzylogix.com<mailto:sudha...@fuzzylogix.com>> wrote: The property “spark.yarn.jars” available via /usr/hdp/current/spark2-client/conf/spark-default.conf spark.yarn.jars hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2<http://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2> Is there any other way to set/read/pass this property “spark.yarn.jars” ? From: Sudha KS [mailto:sudha...@fuzzylogix.com<mailto:sudha...@fuzzylogix.com>] Sent: Wednesday, July 5, 2017 1:51 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: SparkSession via HS2 - Error -spark.yarn.jars not read Why does “spark.yarn.jars” property not read, in this HDP 2.6 , Spark2.1.1 cluster: 0: jdbc:hive2://localhost:1/db> set spark.yarn.jars; +--+--+ | set | +--+--+ | spark.yarn.jars=hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2<http://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2> | +--+--+ 1 row selected (0.101 seconds) 0: jdbc:hive2://localhost:1/db> Error during launch of a SparkSession via HS2: Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499235958765_0042/container_e04_1499235958765_0042_01_05/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
RE: SparkSession via HS2 - Error -spark.yarn.jars not read
The property "spark.yarn.jars" available via /usr/hdp/current/spark2-client/conf/spark-default.conf spark.yarn.jars hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2 Is there any other way to set/read/pass this property "spark.yarn.jars" ? From: Sudha KS [mailto:sudha...@fuzzylogix.com] Sent: Wednesday, July 5, 2017 1:51 PM To: user@spark.apache.org Subject: SparkSession via HS2 - Error -spark.yarn.jars not read Why does "spark.yarn.jars" property not read, in this HDP 2.6 , Spark2.1.1 cluster: 0: jdbc:hive2://localhost:1/db> set spark.yarn.jars; +--+--+ | set | +--+--+ | spark.yarn.jars=hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2 | +--+--+ 1 row selected (0.101 seconds) 0: jdbc:hive2://localhost:1/db> Error during launch of a SparkSession via HS2: Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499235958765_0042/container_e04_1499235958765_0042_01_05/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more
SparkSession via HS2 - Error -spark.yarn.jars not read
Why does "spark.yarn.jars" property not read, in this HDP 2.6 , Spark2.1.1 cluster: 0: jdbc:hive2://localhost:1/db> set spark.yarn.jars; +--+--+ | set | +--+--+ | spark.yarn.jars=hdfs://ambari03.fuzzyl.com:8020/hdp/apps/2.6.1.0-129/spark2 | +--+--+ 1 row selected (0.101 seconds) 0: jdbc:hive2://localhost:1/db> Error during launch of a SparkSession via HS2: Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499235958765_0042/container_e04_1499235958765_0042_01_05/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more
SparkSession via HS2 - is it supported?
This is the code: created a java class by extending org.apache.hadoop.hive.ql.udf.generic.GenericUDTF; ,and creates a sparkSession as- SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate(); ,and tries to read a table in hive DB: Dataset inputData = spark.read().table(tableName); Long countRows = inputData.count(); Environment: HDP-2.5.3.0, spark 2.0.0 Steps: 1. Copied this custom UDTF jar into hdfs & also into auxlib 2. Copied /usr/hdp/<2.5.x>/spark2/jars/*.jar into /usr/hdp/<2.5.x>/hive/auxlib/ 3. Connecting to HS2 using beeline to run this Spark UDT: beeline -u jdbc:hive2://localhost:1 -d org.apache.hive.jdbc.HiveDriver CREATE TABLE TestTable (i int); INSERT INTO TestTable VALUES (1); 0: jdbc:hive2://localhost:1/> CREATE FUNCTION SparkUDT AS 'SparkHiveUDTF' using jar 'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ; INFO : converting to local hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar INFO : Added [/tmp/69366d0d-6777-4860-82c0-c61482ccce87_resources/sparkHiveGenericUDTF-1.0.jar] to class path INFO : Added resources: [hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar] No rows affected (0.125 seconds) 0: jdbc:hive2://localhost:1/> SELECT SparkUDT('tbl','TestTable'); failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) at org.apache.spark.SparkContext.(SparkContext.scala:497) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at