Re: Error in Hive on Spark

2016-03-22 Thread Stana
Hi, Xuefu

You are right.
Maybe I should launch spark-submit by HS2 or Hive CLI ?

Thanks a lot,
Stana


2016-03-22 1:16 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> Stana,
>
> I'm not sure if I fully understand the problem. spark-submit is launched in
> the same host as your application, which should be able to access
> hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
> take care of that. Since you are not using either of which, then, it's your
> application's responsibility to make that happen.
>
> Did I missed anything else?
>
> Thanks,
> Xuefu
>
> On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:
>
> > Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> > path in application?
> > Something like
> >
> >
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> >
> >
> >
> > 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
> >
> > > Thanks for reply
> > >
> > > I have set the property spark.home in my application. Otherwise the
> > > application threw 'SPARK_HOME not found exception'.
> > >
> > > I found hive source code in SparkClientImpl.java:
> > >
> > > private Thread startDriver(final RpcServer rpcServer, final String
> > > clientId, final String secret)
> > >   throws IOException {
> > > ...
> > >
> > > List argv = Lists.newArrayList();
> > >
> > > ...
> > >
> > > argv.add("--class");
> > > argv.add(RemoteDriver.class.getName());
> > >
> > > String jar = "spark-internal";
> > > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > > jar = SparkContext.jarOfClass(this.getClass()).get();
> > > }
> > > argv.add(jar);
> > >
> > > ...
> > >
> > > }
> > >
> > > When hive executed spark-submit , it generate the shell command with
> > > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path
> with
> > > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> > path
> > > of hive-exec-2.0.0.jar.
> > >
> > > In my situation, the application and yarn cluster are in different
> > cluster.
> > > When application executed spark-submit with local path of
> > > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> > > yarn cluster. Then application threw the exception:
> "hive-exec-2.0.0.jar
> > >   does not exist ...".
> > >
> > > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > > Something like 'hiveConf.set("hive.remote.driver.jar",
> > > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > > If not, is it possible to achieve in the future version?
> > >
> > >
> > >
> > >
> > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> > >
> > >> You can probably avoid the problem by set environment variable
> > SPARK_HOME
> > >> or JVM property spark.home that points to your spark installation.
> > >>
> > >> --Xuefu
> > >>
> > >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
> > >>
> > >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> > >> >
> > >> > Following are my situations:
> > >> > 1.Building spark 1.4.1 assembly jar without Hive .
> > >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > >> > 3.Executing the java application with eclipse IDE in my client
> > computer.
> > >> >
> > >> > The application went well and it submitted mr job to the yarn
> cluster
> > >> > successfully when using " hiveConf.set("hive.execution.engine",
> "mr")
> > >> > ",but it threw exceptions in spark-engine.
> > >> >
> > >> > Finally, i traced Hive source code and came to the conclusion:
> > >> >
> > >> > In my situation, SparkClientImpl class will generate the
> spark-submit
> > >> > shell and executed it.
> > >> > The shell command allocated  --class with
> RemoteDriver.class.getName()
> > >> >

Re: Error in Hive on Spark

2016-03-21 Thread Stana
Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
path in application?
Something like
'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.



2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:

> Thanks for reply
>
> I have set the property spark.home in my application. Otherwise the
> application threw 'SPARK_HOME not found exception'.
>
> I found hive source code in SparkClientImpl.java:
>
> private Thread startDriver(final RpcServer rpcServer, final String
> clientId, final String secret)
>   throws IOException {
> ...
>
> List argv = Lists.newArrayList();
>
> ...
>
> argv.add("--class");
> argv.add(RemoteDriver.class.getName());
>
> String jar = "spark-internal";
> if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> jar = SparkContext.jarOfClass(this.getClass()).get();
> }
> argv.add(jar);
>
> ...
>
> }
>
> When hive executed spark-submit , it generate the shell command with
> --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
> SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
> of hive-exec-2.0.0.jar.
>
> In my situation, the application and yarn cluster are in different cluster.
> When application executed spark-submit with local path of
> hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
>   does not exist ...".
>
> Can it be set property of hive-exec-2.0.0.jar path in application ?
> Something like 'hiveConf.set("hive.remote.driver.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> If not, is it possible to achieve in the future version?
>
>
>
>
> 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
>
>> You can probably avoid the problem by set environment variable SPARK_HOME
>> or JVM property spark.home that points to your spark installation.
>>
>> --Xuefu
>>
>> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>>
>> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
>> > executing org.apache.hadoop.hive.ql.Driver with java application.
>> >
>> > Following are my situations:
>> > 1.Building spark 1.4.1 assembly jar without Hive .
>> > 2.Uploading the spark assembly jar to the hadoop cluster.
>> > 3.Executing the java application with eclipse IDE in my client computer.
>> >
>> > The application went well and it submitted mr job to the yarn cluster
>> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
>> > ",but it threw exceptions in spark-engine.
>> >
>> > Finally, i traced Hive source code and came to the conclusion:
>> >
>> > In my situation, SparkClientImpl class will generate the spark-submit
>> > shell and executed it.
>> > The shell command allocated  --class with RemoteDriver.class.getName()
>> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
>> > my application threw the exception.
>> >
>> > Is it right? And how can I do to execute the application with
>> > spark-engine successfully in my client computer ? Thanks a lot!
>> >
>> >
>> > Java application code:
>> >
>> > public class TestHiveDriver {
>> >
>> > private static HiveConf hiveConf;
>> > private static Driver driver;
>> > private static CliSessionState ss;
>> > public static void main(String[] args){
>> >
>> > String sql = "select * from hadoop0263_0 as a join
>> > hadoop0263_0 as b
>> > on (a.key = b.key)";
>> > ss = new CliSessionState(new
>> HiveConf(SessionState.class));
>> > hiveConf = new HiveConf(Driver.class);
>> > hiveConf.set("fs.default.name", "hdfs://storm0:9000");
>> > hiveConf.set("yarn.resourcemanager.address",
>> > "storm0:8032");
>> > hiveConf.set("yarn.resourcemanager.scheduler.address",
>> > "storm0:8030");
>> >
>> >
>> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
>> > hiveConf.set("yarn.resourcemanager.admin.address",
>> > "storm0:8033");

Re: Error in Hive on Spark

2016-03-10 Thread Stana
Thanks for reply

I have set the property spark.home in my application. Otherwise the
application threw 'SPARK_HOME not found exception'.

I found hive source code in SparkClientImpl.java:

private Thread startDriver(final RpcServer rpcServer, final String
clientId, final String secret)
  throws IOException {
...

List argv = Lists.newArrayList();

...

argv.add("--class");
argv.add(RemoteDriver.class.getName());

String jar = "spark-internal";
if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
jar = SparkContext.jarOfClass(this.getClass()).get();
}
argv.add(jar);

...

}

When hive executed spark-submit , it generate the shell command with
--class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
of hive-exec-2.0.0.jar.

In my situation, the application and yarn cluster are in different cluster.
When application executed spark-submit with local path of
hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
  does not exist ...".

Can it be set property of hive-exec-2.0.0.jar path in application ?
Something like 'hiveConf.set("hive.remote.driver.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
If not, is it possible to achieve in the future version?



2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> You can probably avoid the problem by set environment variable SPARK_HOME
> or JVM property spark.home that points to your spark installation.
>
> --Xuefu
>
> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>
> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >
> > Following are my situations:
> > 1.Building spark 1.4.1 assembly jar without Hive .
> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > 3.Executing the java application with eclipse IDE in my client computer.
> >
> > The application went well and it submitted mr job to the yarn cluster
> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> > ",but it threw exceptions in spark-engine.
> >
> > Finally, i traced Hive source code and came to the conclusion:
> >
> > In my situation, SparkClientImpl class will generate the spark-submit
> > shell and executed it.
> > The shell command allocated  --class with RemoteDriver.class.getName()
> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > my application threw the exception.
> >
> > Is it right? And how can I do to execute the application with
> > spark-engine successfully in my client computer ? Thanks a lot!
> >
> >
> > Java application code:
> >
> > public class TestHiveDriver {
> >
> > private static HiveConf hiveConf;
> > private static Driver driver;
> > private static CliSessionState ss;
> > public static void main(String[] args){
> >
> > String sql = "select * from hadoop0263_0 as a join
> > hadoop0263_0 as b
> > on (a.key = b.key)";
> > ss = new CliSessionState(new
> HiveConf(SessionState.class));
> > hiveConf = new HiveConf(Driver.class);
> > hiveConf.set("fs.default.name", "hdfs://storm0:9000");
> > hiveConf.set("yarn.resourcemanager.address",
> > "storm0:8032");
> > hiveConf.set("yarn.resourcemanager.scheduler.address",
> > "storm0:8030");
> >
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> > hiveConf.set("yarn.resourcemanager.admin.address",
> > "storm0:8033");
> > hiveConf.set("mapreduce.framework.name", "yarn");
> > hiveConf.set("mapreduce.johistory.address",
> > "storm0:10020");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> > hiveConf.set("javax.jdo.option.ConnectionUserName",
> > "root");
> > hiveConf.set("javax.jdo.option.ConnectionPassword",
> > "123456");
> > hiveConf.setBoolean(&qu

Error in Hive on Spark

2016-03-10 Thread Stana
 I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
executing org.apache.hadoop.hive.ql.Driver with java application.

Following are my situations:
1.Building spark 1.4.1 assembly jar without Hive .
2.Uploading the spark assembly jar to the hadoop cluster.
3.Executing the java application with eclipse IDE in my client computer.

The application went well and it submitted mr job to the yarn cluster
successfully when using " hiveConf.set("hive.execution.engine", "mr")
",but it threw exceptions in spark-engine.

Finally, i traced Hive source code and came to the conclusion:

In my situation, SparkClientImpl class will generate the spark-submit
shell and executed it.
The shell command allocated  --class with RemoteDriver.class.getName()
and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
my application threw the exception.

Is it right? And how can I do to execute the application with
spark-engine successfully in my client computer ? Thanks a lot!


Java application code:

public class TestHiveDriver {

private static HiveConf hiveConf;
private static Driver driver;
private static CliSessionState ss;
public static void main(String[] args){

String sql = "select * from hadoop0263_0 as a join hadoop0263_0 
as b
on (a.key = b.key)";
ss = new CliSessionState(new HiveConf(SessionState.class));
hiveConf = new HiveConf(Driver.class);
hiveConf.set("fs.default.name", "hdfs://storm0:9000");
hiveConf.set("yarn.resourcemanager.address", "storm0:8032");
hiveConf.set("yarn.resourcemanager.scheduler.address", 
"storm0:8030");

hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
hiveConf.set("yarn.resourcemanager.admin.address", 
"storm0:8033");
hiveConf.set("mapreduce.framework.name", "yarn");
hiveConf.set("mapreduce.johistory.address", "storm0:10020");

hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");

hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
hiveConf.set("javax.jdo.option.ConnectionUserName", "root");
hiveConf.set("javax.jdo.option.ConnectionPassword", "123456");
hiveConf.setBoolean("hive.auto.convert.join",false);
hiveConf.set("spark.yarn.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
hiveConf.set("spark.home","target/spark");
hiveConf.set("hive.execution.engine", "spark");
hiveConf.set("hive.dbname", "default");


driver = new Driver(hiveConf);
SessionState.start(hiveConf);

CommandProcessorResponse res = null;
try {
res = driver.run(sql);
} catch (CommandNeedRetryException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

System.out.println("Response Code:" + res.getResponseCode());
System.out.println("Error Message:" + res.getErrorMessage());
System.out.println("SQL State:" + res.getSQLState());

}
}




Exception of spark-engine:

16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
argv: 
/Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
--properties-file
/var/folders/vt/cjcdhms903x7brn1kbh558s4gn/T/spark-submit.7697089826296920539.properties
--class org.apache.hive.spark.client.RemoteDriver
/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
--remote-host MacBook-Pro.local --remote-port 51331 --conf
hive.spark.client.connect.timeout=1000 --conf
hive.spark.client.server.connect.timeout=9 --conf
hive.spark.client.channel.log.level=null --conf
hive.spark.client.rpc.max.size=52428800 --conf
hive.spark.client.rpc.threads=8 --conf
hive.spark.client.secret.bits=256
16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
16/03/10 18:33:09 INFO SparkClientImpl:  client token: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  diagnostics: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster host: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster RPC port: -1
16/03/10 18:33:09 INFO SparkClientImpl:  queue: default