Re: hive on spark - why is it so hard?

2017-10-01 Thread Stephen Sprague
so...  i made some progress after much copying of jar files around (as
alluded to by Gopal previously on this thread).


following the instructions here:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

and doing this as instructed will leave off about a dozen or so jar files
that spark'll need:
  ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz
"-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"

i ended copying the missing jars to $SPARK_HOME/jars but i would have
preferred to just add a path(s) to the spark class path but i did not find
any effective way to do that. In hive you can specify HIVE_AUX_JARS_PATH
but i don't see the analagous var in spark - i don't think it inherits the
hive classpath.

anyway a simple query is now working under Hive On Spark so i think i might
be over the hump.  Now its a matter of comparing the performance with Tez.

Cheers,
Stephen.


On Wed, Sep 27, 2017 at 9:37 PM, Stephen Sprague  wrote:

> ok.. getting further.  seems now i have to deploy hive to all nodes in the
> cluster - don't think i had to do that before but not a big deal to do it
> now.
>
> for me:
> HIVE_HOME=/usr/lib/apache-hive-2.3.0-bin/
> SPARK_HOME=/usr/lib/spark-2.2.0-bin-hadoop2.6
>
> on all three nodes now.
>
> i started spark master on the namenode and i started spark slaves (2) on
> two datanodes of the cluster.
>
> so far so good.
>
> now i run my usual test command.
>
> $ hive --hiveconf hive.root.logger=DEBUG,console -e 'set
> hive.execution.engine=spark; select date_key, count(*) from
> fe_inventory.merged_properties_hist group by 1 order by 1;'
>
> i get a little further now and find the stderr from the Spark Web UI
> interface (nice) and it reports this:
>
> 17/09/27 20:47:35 INFO WorkerWatcher: Successfully connected to 
> spark://Worker@172.19.79.127:40145
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
>   at 
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)*Caused 
> by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS*
>   at 
> org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:47)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:134)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516)
>   ... 6 more
>
>
>
> searching around the internet i find this is probably a compatibility
> issue.
>
> i know. i know. no surprise here.
>
> so i guess i just got to the point where everybody else is... build spark
> w/o hive.
>
> lemme see what happens next.
>
>
>
>
>
> On Wed, Sep 27, 2017 at 7:41 PM, Stephen Sprague 
> wrote:
>
>> thanks.  I haven't had a chance to dig into this again today but i do
>> appreciate the pointer.  I'll keep you posted.
>>
>> On Wed, Sep 27, 2017 at 10:14 AM, Sahil Takiar 
>> wrote:
>>
>>> You can try increasing the value of hive.spark.client.connect.timeout.
>>> Would also suggest taking a look at the HoS Remote Driver logs. The driver
>>> gets launched in a YARN container (assuming you are running Spark in
>>> yarn-client mode), so you just have to find the logs for that container.
>>>
>>> --Sahil
>>>
>>> On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague 
>>> wrote:
>>>
 i _seem_ to be getting closer.  Maybe its just wishful thinking.
 Here's where i'm at now.

 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
 17/09/26 21:10:38 INFO rest.RestSubmissionClient: Server responded with
 CreateSubmissionResponse:
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: {
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
   "action" : "CreateSubmissionResponse",
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
   "message" : "Driver successfully submitted as 
 driver-20170926211038-0003",
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
   "serverSparkVersion" : "2.2.0",
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
   "submissionId" : "driver-20170926211038-0003",
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
   "success" : true
 2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: }
 2017-09-26T21:10:45,701 DEBUG [IPC Client (425015667) connection to
 dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC
 Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.1
 9.73.136:8020 from dwr: close

Killing JVM on Metastore on OnOutOfMemory error

2017-10-01 Thread Akash Mishra
Hi *,

I am trying to set up an OnOutOfMemory kill on hive-env.sh so that JVM
should shutdown on OOM error in Hive Metastore.

Code is :

if [ "$SERVICE" = "metastore" ]; then
  export HADOOP_OPTS="$HADOOP_OPTS -XX:OnOutOfMemoryError=\"kill -9 %p\" "
fi


On starting Hive Metastore is, I am getting,


h...@hadoopdev7.mlan:~> hive --service metastore
Unrecognized option: -9
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Unable to determine Hadoop version information.
'hadoop version' returned:
Unrecognized option: -9
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.



Any ideas?





-- 

Regards,
Akash Mishra.


"It's not our abilities that make us, but our decisions."--Albus Dumbledore


Re: Hplsql Cursor Loop Java Error - Resolved

2017-10-01 Thread Dmitry Tolpeko
Thanks! It should not throw NPE anyway, I will create a ticket for

Exception in thread "main" java.lang.NullPointerException
at org.apache.hive.hplsql.Select.getIntoCount(Select.java:405)

Thanks,
Dmitry

On Sun, Oct 1, 2017 at 1:43 AM, Srinivas Alavala  wrote:

>
> I forgot to add the END LOOP statement. It is working fine.
>
> Thanks,
> Srini Alavala
>
> -Srinivas Alavala/RCGIT wrote: -
> To: user@hive.apache.org
> From: Srinivas Alavala/RCGIT
> Date: 09/30/2017 05:37PM
> Cc: dmtolp...@gmail.com
> Subject: Hplsql Cursor Loop Java Error
>
> Hi Guys,
>
> I am very excited about hplsql. I have started writing code in Cloudera
> CDH 5.12.X environment with Kerberos security. I am getting jave error
> while I am trying to execute Cursor loop statement as shown below. How do I
> troubleshoot this error? Trace didn't show much info.
>
> I want to find out if this is production ready. Please guide us.
>
>
> CREATE PROCEDURE getPromoteList(IN position STRING, OUT status STRING)
> BEGIN
>
> FOR cur IN
> (SELECT customer_cid, customer_first_name FROM Table_Name where
> prowess_period_id=201601 and customer_pay_title = 'Value')
> LOOP
> DBMS_OUTPUT.PUT_LINE(cur.customer_cid);
> END;
>
> DECLARE status STRING;
> CALL getPromoteList(201601,status);
> PRINT status;
>
>
> Ln:1 CREATE FUNCTION hello
> Ln:1 CREATE PROCEDURE getPromoteList
> Ln:10 DECLARE status STRING
> Ln:11 EXEC PROCEDURE getPromoteList
> Ln:11 SET PARAM position = 201601
> Ln:11 SET PARAM status = null
> Ln:5 SELECT
> Ln:5 
> 
> [main] INFO org.apache.hive.jdbc.Utils - Supplied authorities:
> *
> [main] INFO org.apache.hive.jdbc.Utils - Resolved authority:
> 
> [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> Open connection: j*
> 
> Starting query
> Query executed successfully (38.97 sec)
> Ln:5 SELECT completed successfully
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hive.hplsql.Select.getIntoCount(Select.java:405)
> at org.apache.hive.hplsql.Select.select(Select.java:88)
> at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1021)
> at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:52)
> at org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(
> HplsqlParser.java:15050)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1013)
> at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:52)
> at org.apache.hive.hplsql.HplsqlParser$StmtContext.
> accept(HplsqlParser.java:1023)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(
> HplsqlBaseVisitor.java:28)
> at org.apache.hive.hplsql.HplsqlParser$BlockContext.
> accept(HplsqlParser.java:454)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.Exec.visitBegin_end_block(Exec.java:930)
> at org.apache.hive.hplsql.Exec.visitBegin_end_block(Exec.java:52)
> at org.apache.hive.hplsql.HplsqlParser$Begin_end_
> blockContext.accept(HplsqlParser.java:549)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.HplsqlBaseVisitor.visitProc_
> block(HplsqlBaseVisitor.java:56)
> at org.apache.hive.hplsql.HplsqlParser$Proc_blockContext.accept(
> HplsqlParser.java:756)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(
> AbstractParseTreeVisitor.java:42)
> at org.apache.hive.hplsql.functions.Function.visit(
> Function.java:754)
> at org.apache.hive.hplsql.functions.Function.execProc(
> Function.java:244)
> at org.apache.hive.hplsql.Exec.visitCall_stmt(Exec.java:1797)
> at org.apache.hive.hplsql.Exec.visitCall_stmt(Exec.java:52)
> at org.apache.hive.hplsql.HplsqlParser$Call_stmtContext.
> accept(HplsqlParser.java:3191)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1013)
> at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:52)
> at org.apache.hive.hplsql.HplsqlParser$StmtContext.
> accept(HplsqlParser.java:1023)
> at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.
> visitChildren(AbstractParseTreeVisitor.java:70)
> at org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(
> HplsqlBaseVisitor.java:28)
> at org.