Re: thrift jdbc server probably running queries as hive query

2014-11-11 Thread Sadhan Sood
Hi Cheng,

I made sure the only hive server running on the machine is
hivethriftserver2.

/usr/lib/jvm/default-java/bin/java -cp
/usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf
-Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn
--jars reporting.jar spark-internal

The query I am running is a simple count(*): select count(*) from Xyz
where date_prefix=20141031 and pretty sure it's submitting a map reduce
job based on the spark logs:

TakesRest=false

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=number

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=number

In order to set a constant number of reducers:

  set mapreduce.job.reduces=number

14/11/11 16:23:17 INFO ql.Context: New scratch dir is
hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2

Starting Job = job_1414084656759_0142, Tracking URL =
http://xxx:8100/proxy/application_1414084656759_0142/
http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2Fsi=6222577584832512pi=626685a9-b628-43cc-91a1-93636171ce77

Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414084656759_0142

On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian lian.cs@gmail.com wrote:

  Hey Sadhan,

 I really don't think this is Spark log... Unlike Shark, Spark SQL doesn't
 even provide a Hive mode to let you execute queries against Hive. Would you
 please check whether there is an existing HiveServer2 running there? Spark
 SQL HiveThriftServer2 is just a Spark port of HiveServer2, and they share
 the same default listening port. I guess the Thrift server didn't start
 successfully because the HiveServer2 occupied the port, and your Beeline
 session was probably linked against HiveServer2.

 Cheng


 On 11/11/14 8:29 AM, Sadhan Sood wrote:

 I was testing out the spark thrift jdbc server by running a simple query
 in the beeline client. The spark itself is running on a yarn cluster.

 However, when I run a query in beeline - I see no running jobs in the
 spark UI(completely empty) and the yarn UI seem to indicate that the
 submitted query is being run as a map reduce job. This is probably also
 being indicated from the spark logs but I am not completely sure:

  2014-11-11 00:19:00,492 INFO  ql.Context
 (Context.java:getMRScratchDir(267)) - New scratch dir is
 hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1

 2014-11-11 00:19:00,877 INFO  ql.Context
 (Context.java:getMRScratchDir(267)) - New scratch dir is
 hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

 2014-11-11 00:19:04,152 INFO  ql.Context
 (Context.java:getMRScratchDir(267)) - New scratch dir is
 hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

 2014-11-11 00:19:04,425 INFO  Configuration.deprecation
 (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.submit.replication
 is deprecated. Instead, use mapreduce.client.submit.file.replication

 2014-11-11 00:19:04,516 INFO  client.RMProxy
 (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
 at :8032

 2014-11-11 00:19:04,607 INFO  client.RMProxy
 (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
 at :8032

 2014-11-11 00:19:04,639 WARN  mapreduce.JobSubmitter
 (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option
 parsing not performed. Implement the Tool interface and execute your
 application with ToolRunner to remedy this

 2014-11-11 00:00:08,806 INFO  input.FileInputFormat
 (FileInputFormat.java:listStatus(287)) - Total input paths to process :
 14912

 2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
 (GPLNativeCodeLoader.java:clinit(34)) - Loaded native gpl library

 2014-11-11 00:00:08,866 INFO  lzo.LzoCodec (LzoCodec.java:clinit(76)) -
 Successfully loaded  initialized native-lzo library [hadoop-lzo rev
 8e266e052e423af592871e2dfe09d54c03f6a0e8]

 2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
 (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated node
 allocation with : CompletedNodes: 1, size left: 194541317

 2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
 (JobSubmitter.java:submitJobInternal(396)) - number of splits:615

 2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
 (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
 job_1414084656759_0115

 2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
 (YarnClientImpl.java:submitApplication(167)) 

Re: thrift jdbc server probably running queries as hive query

2014-11-10 Thread Cheng Lian

Hey Sadhan,

I really don't think this is Spark log... Unlike Shark, Spark SQL 
doesn't even provide a Hive mode to let you execute queries against 
Hive. Would you please check whether there is an existing HiveServer2 
running there? Spark SQL HiveThriftServer2 is just a Spark port of 
HiveServer2, and they share the same default listening port. I guess the 
Thrift server didn't start successfully because the HiveServer2 occupied 
the port, and your Beeline session was probably linked against HiveServer2.


Cheng

On 11/11/14 8:29 AM, Sadhan Sood wrote:
I was testing out the spark thrift jdbc server by running a simple 
query in the beeline client. The spark itself is running on a yarn 
cluster.


However, when I run a query in beeline - I see no running jobs in the 
spark UI(completely empty) and the yarn UI seem to indicate that the 
submitted query is being run as a map reduce job. This is probably 
also being indicated from the spark logs but I am not completely sure:


2014-11-11 00:19:00,492 INFO  ql.Context 
(Context.java:getMRScratchDir(267)) - New scratch dir is 
hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1


2014-11-11 00:19:00,877 INFO  ql.Context 
(Context.java:getMRScratchDir(267)) - New scratch dir is 
hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2


2014-11-11 00:19:04,152 INFO  ql.Context 
(Context.java:getMRScratchDir(267)) - New scratch dir is 
hdfs://:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2


2014-11-11 00:19:04,425 INFO Configuration.deprecation 
(Configuration.java:warnOnceIfDeprecated(1009)) - 
mapred.submit.replication is deprecated. Instead, use 
mapreduce.client.submit.file.replication


2014-11-11 00:19:04,516 INFO  client.RMProxy 
(RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
at :8032


2014-11-11 00:19:04,607 INFO  client.RMProxy 
(RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
at :8032


2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter 
(JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line 
option parsing not performed. Implement the Tool interface and execute 
your application with ToolRunner to remedy this


2014-11-11 00:00:08,806 INFO  input.FileInputFormat 
(FileInputFormat.java:listStatus(287)) - Total input paths to process 
: 14912


2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader 
(GPLNativeCodeLoader.java:clinit(34)) - Loaded native gpl library


2014-11-11 00:00:08,866 INFO  lzo.LzoCodec 
(LzoCodec.java:clinit(76)) - Successfully loaded  initialized 
native-lzo library [hadoop-lzo rev 
8e266e052e423af592871e2dfe09d54c03f6a0e8]


2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat 
(CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated 
node allocation with : CompletedNodes: 1, size left: 194541317


2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(396)) - number of splits:615


2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter 
(JobSubmitter.java:printTokens(479)) - Submitting tokens for job: 
job_1414084656759_0115


2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl 
(YarnClientImpl.java:submitApplication(167)) - Submitted application 
application_1414084656759_0115



It seems like the query is being run as a hive query instead of spark 
query. The same query works fine when run from spark-sql cli.