Re: hive on spark - why is it so hard?
so... i made some progress after much copying of jar files around (as alluded to by Gopal previously on this thread). following the instructions here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started and doing this as instructed will leave off about a dozen or so jar files that spark'll need: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" i ended copying the missing jars to $SPARK_HOME/jars but i would have preferred to just add a path(s) to the spark class path but i did not find any effective way to do that. In hive you can specify HIVE_AUX_JARS_PATH but i don't see the analagous var in spark - i don't think it inherits the hive classpath. anyway a simple query is now working under Hive On Spark so i think i might be over the hump. Now its a matter of comparing the performance with Tez. Cheers, Stephen. On Wed, Sep 27, 2017 at 9:37 PM, Stephen Sprague wrote: > ok.. getting further. seems now i have to deploy hive to all nodes in the > cluster - don't think i had to do that before but not a big deal to do it > now. > > for me: > HIVE_HOME=/usr/lib/apache-hive-2.3.0-bin/ > SPARK_HOME=/usr/lib/spark-2.2.0-bin-hadoop2.6 > > on all three nodes now. > > i started spark master on the namenode and i started spark slaves (2) on > two datanodes of the cluster. > > so far so good. > > now i run my usual test command. > > $ hive --hiveconf hive.root.logger=DEBUG,console -e 'set > hive.execution.engine=spark; select date_key, count(*) from > fe_inventory.merged_properties_hist group by 1 order by 1;' > > i get a little further now and find the stderr from the Spark Web UI > interface (nice) and it reports this: > > 17/09/27 20:47:35 INFO WorkerWatcher: Successfully connected to > spark://Worker@172.19.79.127:40145 > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) > at > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)*Caused > by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS* > at > org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:47) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:134) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516) > ... 6 more > > > > searching around the internet i find this is probably a compatibility > issue. > > i know. i know. no surprise here. > > so i guess i just got to the point where everybody else is... build spark > w/o hive. > > lemme see what happens next. > > > > > > On Wed, Sep 27, 2017 at 7:41 PM, Stephen Sprague > wrote: > >> thanks. I haven't had a chance to dig into this again today but i do >> appreciate the pointer. I'll keep you posted. >> >> On Wed, Sep 27, 2017 at 10:14 AM, Sahil Takiar >> wrote: >> >>> You can try increasing the value of hive.spark.client.connect.timeout. >>> Would also suggest taking a look at the HoS Remote Driver logs. The driver >>> gets launched in a YARN container (assuming you are running Spark in >>> yarn-client mode), so you just have to find the logs for that container. >>> >>> --Sahil >>> >>> On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague >>> wrote: >>> i _seem_ to be getting closer. Maybe its just wishful thinking. Here's where i'm at now. 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: 17/09/26 21:10:38 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse: 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: { 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: "action" : "CreateSubmissionResponse", 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: "message" : "Driver successfully submitted as driver-20170926211038-0003", 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: "serverSparkVersion" : "2.2.0", 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: "submissionId" : "driver-20170926211038-0003", 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: "success" : true 2017-09-26T21:10:38,892 INFO [stderr-redir-1] client.SparkClientImpl: } 2017-09-26T21:10:45,701 DEBUG [IPC Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.1 9.73.136:8020 from dwr: close
Killing JVM on Metastore on OnOutOfMemory error
Hi *, I am trying to set up an OnOutOfMemory kill on hive-env.sh so that JVM should shutdown on OOM error in Hive Metastore. Code is : if [ "$SERVICE" = "metastore" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:OnOutOfMemoryError=\"kill -9 %p\" " fi On starting Hive Metastore is, I am getting, h...@hadoopdev7.mlan:~> hive --service metastore Unrecognized option: -9 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Unable to determine Hadoop version information. 'hadoop version' returned: Unrecognized option: -9 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Any ideas? -- Regards, Akash Mishra. "It's not our abilities that make us, but our decisions."--Albus Dumbledore
Re: Hplsql Cursor Loop Java Error - Resolved
Thanks! It should not throw NPE anyway, I will create a ticket for Exception in thread "main" java.lang.NullPointerException at org.apache.hive.hplsql.Select.getIntoCount(Select.java:405) Thanks, Dmitry On Sun, Oct 1, 2017 at 1:43 AM, Srinivas Alavala wrote: > > I forgot to add the END LOOP statement. It is working fine. > > Thanks, > Srini Alavala > > -Srinivas Alavala/RCGIT wrote: - > To: user@hive.apache.org > From: Srinivas Alavala/RCGIT > Date: 09/30/2017 05:37PM > Cc: dmtolp...@gmail.com > Subject: Hplsql Cursor Loop Java Error > > Hi Guys, > > I am very excited about hplsql. I have started writing code in Cloudera > CDH 5.12.X environment with Kerberos security. I am getting jave error > while I am trying to execute Cursor loop statement as shown below. How do I > troubleshoot this error? Trace didn't show much info. > > I want to find out if this is production ready. Please guide us. > > > CREATE PROCEDURE getPromoteList(IN position STRING, OUT status STRING) > BEGIN > > FOR cur IN > (SELECT customer_cid, customer_first_name FROM Table_Name where > prowess_period_id=201601 and customer_pay_title = 'Value') > LOOP > DBMS_OUTPUT.PUT_LINE(cur.customer_cid); > END; > > DECLARE status STRING; > CALL getPromoteList(201601,status); > PRINT status; > > > Ln:1 CREATE FUNCTION hello > Ln:1 CREATE PROCEDURE getPromoteList > Ln:10 DECLARE status STRING > Ln:11 EXEC PROCEDURE getPromoteList > Ln:11 SET PARAM position = 201601 > Ln:11 SET PARAM status = null > Ln:5 SELECT > Ln:5 > > [main] INFO org.apache.hive.jdbc.Utils - Supplied authorities: > * > [main] INFO org.apache.hive.jdbc.Utils - Resolved authority: > > [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Open connection: j* > > Starting query > Query executed successfully (38.97 sec) > Ln:5 SELECT completed successfully > Exception in thread "main" java.lang.NullPointerException > at org.apache.hive.hplsql.Select.getIntoCount(Select.java:405) > at org.apache.hive.hplsql.Select.select(Select.java:88) > at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1021) > at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:52) > at org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept( > HplsqlParser.java:15050) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1013) > at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:52) > at org.apache.hive.hplsql.HplsqlParser$StmtContext. > accept(HplsqlParser.java:1023) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock( > HplsqlBaseVisitor.java:28) > at org.apache.hive.hplsql.HplsqlParser$BlockContext. > accept(HplsqlParser.java:454) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.Exec.visitBegin_end_block(Exec.java:930) > at org.apache.hive.hplsql.Exec.visitBegin_end_block(Exec.java:52) > at org.apache.hive.hplsql.HplsqlParser$Begin_end_ > blockContext.accept(HplsqlParser.java:549) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.HplsqlBaseVisitor.visitProc_ > block(HplsqlBaseVisitor.java:56) > at org.apache.hive.hplsql.HplsqlParser$Proc_blockContext.accept( > HplsqlParser.java:756) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit( > AbstractParseTreeVisitor.java:42) > at org.apache.hive.hplsql.functions.Function.visit( > Function.java:754) > at org.apache.hive.hplsql.functions.Function.execProc( > Function.java:244) > at org.apache.hive.hplsql.Exec.visitCall_stmt(Exec.java:1797) > at org.apache.hive.hplsql.Exec.visitCall_stmt(Exec.java:52) > at org.apache.hive.hplsql.HplsqlParser$Call_stmtContext. > accept(HplsqlParser.java:3191) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1013) > at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:52) > at org.apache.hive.hplsql.HplsqlParser$StmtContext. > accept(HplsqlParser.java:1023) > at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor. > visitChildren(AbstractParseTreeVisitor.java:70) > at org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock( > HplsqlBaseVisitor.java:28) > at org.