Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh
Any help on this.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
wrote:

> Hi Ted ,
> Its typo.
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>
>> In the last snippet, temptable is shown by 'show tables' command.
>> Yet you queried tampTable.
>>
>> I believe this just was typo :-)
>>
>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have configured Spark to query on hive table.
>>>
>>> Run the Thrift JDBC/ODBC server using below command :
>>>
>>> *cd $SPARK_HOME*
>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>> hive.server2.thrift.port=*
>>>
>>> and also able to connect through beeline
>>>
>>> *beeline>* !connect jdbc:hive2://192.168.145.20:
>>> Enter username for jdbc:hive2://192.168.145.20:: root
>>> Enter password for jdbc:hive2://192.168.145.20:: impetus
>>> *beeline > *
>>>
>>> It is not giving query result on hive table through Spark JDBC, but it
>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>
>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>
>>> Below are version details.
>>>
>>> *Hive Version  : 1.2.1*
>>> *Hadoop Version :  2.6.0*
>>> *Spark version:  1.3.1*
>>>
>>> Let me know if need other details.
>>>
>>>
>>> *Created Hive Table , insert some records and query it :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:1*
>>> Enter username for jdbc:hive2://myhost:1: root
>>> Enter password for jdbc:hive2://myhost:1: **
>>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>> *beeline> insert into table tampTable values
>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>> *beeline> select name from tampTable;*
>>> name
>>> -
>>> row1
>>> row3
>>> row2
>>>
>>> *Query through SparkSQL HiveSQLContext :*
>>>
>>> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>> SparkContext sc = new SparkContext(sparkConf);
>>> HiveContext hiveContext = new HiveContext(sc);
>>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>>> List teenagerNames = teenagers.toJavaRDD().map(new Function>> String>() {
>>>  @Override
>>>  public String call(Row row) {
>>>  return "Name: " + row.getString(0);
>>>  }
>>> }).collect();
>>> for (String name: teenagerNames) {
>>>  System.out.println(name);
>>> }
>>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>>> sc.stop();
>>>
>>> which is working perfectly and giving all names from table *tempTable*
>>>
>>> *Query through Spark SQL JDBC :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:*
>>> Enter username for jdbc:hive2://myhost:: root
>>> Enter password for jdbc:hive2://myhost:: **
>>> *beeline> show tables;*
>>> *temptable*
>>> *..other tables*
>>> beeline> *SELECT name FROM tampTable;*
>>>
>>> I can list the table through "show tables", but I run the query , it is
>>> either hanged or returns nothing.
>>>
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>


Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh
Adding to it

job status at UI :

Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write
1 select ename from employeetest(kill
)collect
at SparkPlan.scala:84
+details

2016/01/29 04:20:06 3.0 min
0/2

Getting below exception on Spark UI :

org.apache.spark.rdd.RDD.collect(RDD.scala:813)
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh 
wrote:

> Any help on this.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
> wrote:
>
>> Hi Ted ,
>> Its typo.
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>>
>>> In the last snippet, temptable is shown by 'show tables' command.
>>> Yet you queried tampTable.
>>>
>>> I believe this just was typo :-)
>>>
>>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
>>> wrote:
>>>
 Hi All,

 I have configured Spark to query on hive table.

 Run the Thrift JDBC/ODBC server using below command :

 *cd $SPARK_HOME*
 *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
 hive.server2.thrift.bind.host=myhost --hiveconf
 hive.server2.thrift.port=*

 and also able to connect through beeline

 *beeline>* !connect jdbc:hive2://192.168.145.20:
 Enter username for jdbc:hive2://192.168.145.20:: root
 Enter password for jdbc:hive2://192.168.145.20:: impetus
 *beeline > *

 It is not giving query result on hive table through Spark JDBC, but it
 is working with spark HiveSQLContext. See complete scenario explain below.

 Help me understand the issue why Spark SQL JDBC is not giving result ?

 Below are version details.

 *Hive Version  : 1.2.1*
 *Hadoop Version :  2.6.0*
 *Spark version:  1.3.1*

 Let me know if need other details.


 *Created Hive Table , insert some records and query it :*

 *beeline> !connect jdbc:hive2://myhost:1*
 Enter username for jdbc:hive2://myhost:1: root
 Enter password for jdbc:hive2://myhost:1: **
 *beeline> create table tampTable(id int ,name string ) clustered by
 (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
 *beeline> insert into table tampTable values
 (1,'row1'),(2,'row2'),(3,'row3');*
 *beeline> select name from tampTable;*
 name
 -
 row1
 row3
 row2

 *Query through SparkSQL HiveSQLContext :*

 SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
 SparkContext sc = new SparkContext(sparkConf);
 HiveContext hiveContext = new HiveContext(sc);
 DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
 List teenagerNames = teenagers.toJavaRDD().map(new
 Function() {
  @Override
  public String call(Row row) {
  return "Name: " + row.getString(0);
  }
 }).collect();
 for (String name: teenagerNames) {
  System.out.println(name);
 }
 teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
 sc.stop();

 which is working perfectly and giving all names from table *tempTable*

 *Query through Spark SQL JDBC :*

 *beeline> !connect jdbc:hive2://myhost:*
 Enter username for jdbc:hive2://myhost:: root
 Enter password for 

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh
It working now ...

I checked at Spark worker UI , executor startup failing with below error ,
JVM initialization failing because of wrong -Xms :

Invalid initial heap size: -Xms0MError: Could not create the Java
Virtual Machine.Error: A fatal exception has occurred. Program will
exit.

Thrift server is not picking executor memory from *spark-env.sh*​ , then I
added in thrift server startup script explicitly.

*./sbin/start-thriftserver.sh*

exec "$FWDIR"/sbin/spark-daemon.sh spark-submit $CLASS 1
--executor-memory 512M "$@"

With this , Executor start getting valid memory and JDBC queries are
getting results.

*conf/spark-env.sh*​ (executor memory configurations not picked by
thrift-server)

export SPARK_JAVA_OPTS="-Dspark.executor.memory=512M"
export SPARK_EXECUTOR_MEMORY=512M


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 10:57 PM, @Sanjiv Singh 
wrote:

> Adding to it
>
> job status at UI :
>
> Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
> ReadShuffle Write
> 1 select ename from employeetest(kill
> )collect
> at SparkPlan.scala:84
> +details
>
> 2016/01/29 04:20:06 3.0 min
> 0/2
>
> Getting below exception on Spark UI :
>
> org.apache.spark.rdd.RDD.collect(RDD.scala:813)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh 
> wrote:
>
>> Any help on this.
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
>> wrote:
>>
>>> Hi Ted ,
>>> Its typo.
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>>>
 In the last snippet, temptable is shown by 'show tables' command.
 Yet you queried tampTable.

 I believe this just was typo :-)

 On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
 wrote:

> Hi All,
>
> I have configured Spark to query on hive table.
>
> Run the Thrift JDBC/ODBC server using below command :
>
> *cd $SPARK_HOME*
> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
> hive.server2.thrift.bind.host=myhost --hiveconf
> hive.server2.thrift.port=*
>
> and also able to connect through beeline
>
> *beeline>* !connect jdbc:hive2://192.168.145.20:
> Enter username for jdbc:hive2://192.168.145.20:: root
> Enter password for jdbc:hive2://192.168.145.20:: impetus
> *beeline > *
>
> It is not giving query result on hive table through Spark JDBC, but it
> is working with spark HiveSQLContext. See complete scenario explain below.
>
> Help me understand the issue why Spark SQL JDBC is not giving result ?
>
> Below are version details.
>
> *Hive Version  : 1.2.1*
> *Hadoop Version :  2.6.0*
> *Spark version:  1.3.1*
>
> Let me know if need other details.
>
>
> *Created Hive Table , insert some records and query it :*
>
> *beeline> !connect jdbc:hive2://myhost:1*
> Enter username for jdbc:hive2://myhost:1: root
> Enter password for jdbc:hive2://myhost:1: **
> *beeline> create table tampTable(id int ,name string ) clustered by
> (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
> *beeline> insert into table tampTable values
> 

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-27 Thread @Sanjiv Singh
Hi Ted ,
Its typo.


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:

> In the last snippet, temptable is shown by 'show tables' command.
> Yet you queried tampTable.
>
> I believe this just was typo :-)
>
> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
> wrote:
>
>> Hi All,
>>
>> I have configured Spark to query on hive table.
>>
>> Run the Thrift JDBC/ODBC server using below command :
>>
>> *cd $SPARK_HOME*
>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>> hive.server2.thrift.bind.host=myhost --hiveconf
>> hive.server2.thrift.port=*
>>
>> and also able to connect through beeline
>>
>> *beeline>* !connect jdbc:hive2://192.168.145.20:
>> Enter username for jdbc:hive2://192.168.145.20:: root
>> Enter password for jdbc:hive2://192.168.145.20:: impetus
>> *beeline > *
>>
>> It is not giving query result on hive table through Spark JDBC, but it is
>> working with spark HiveSQLContext. See complete scenario explain below.
>>
>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>
>> Below are version details.
>>
>> *Hive Version  : 1.2.1*
>> *Hadoop Version :  2.6.0*
>> *Spark version:  1.3.1*
>>
>> Let me know if need other details.
>>
>>
>> *Created Hive Table , insert some records and query it :*
>>
>> *beeline> !connect jdbc:hive2://myhost:1*
>> Enter username for jdbc:hive2://myhost:1: root
>> Enter password for jdbc:hive2://myhost:1: **
>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>> *beeline> insert into table tampTable values
>> (1,'row1'),(2,'row2'),(3,'row3');*
>> *beeline> select name from tampTable;*
>> name
>> -
>> row1
>> row3
>> row2
>>
>> *Query through SparkSQL HiveSQLContext :*
>>
>> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>> SparkContext sc = new SparkContext(sparkConf);
>> HiveContext hiveContext = new HiveContext(sc);
>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>> List teenagerNames = teenagers.toJavaRDD().map(new Function> String>() {
>>  @Override
>>  public String call(Row row) {
>>  return "Name: " + row.getString(0);
>>  }
>> }).collect();
>> for (String name: teenagerNames) {
>>  System.out.println(name);
>> }
>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>> sc.stop();
>>
>> which is working perfectly and giving all names from table *tempTable*
>>
>> *Query through Spark SQL JDBC :*
>>
>> *beeline> !connect jdbc:hive2://myhost:*
>> Enter username for jdbc:hive2://myhost:: root
>> Enter password for jdbc:hive2://myhost:: **
>> *beeline> show tables;*
>> *temptable*
>> *..other tables*
>> beeline> *SELECT name FROM tampTable;*
>>
>> I can list the table through "show tables", but I run the query , it is
>> either hanged or returns nothing.
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>


Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-27 Thread Ted Yu
In the last snippet, temptable is shown by 'show tables' command.
Yet you queried tampTable.

I believe this just was typo :-)

On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
wrote:

> Hi All,
>
> I have configured Spark to query on hive table.
>
> Run the Thrift JDBC/ODBC server using below command :
>
> *cd $SPARK_HOME*
> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
> hive.server2.thrift.bind.host=myhost --hiveconf
> hive.server2.thrift.port=*
>
> and also able to connect through beeline
>
> *beeline>* !connect jdbc:hive2://192.168.145.20:
> Enter username for jdbc:hive2://192.168.145.20:: root
> Enter password for jdbc:hive2://192.168.145.20:: impetus
> *beeline > *
>
> It is not giving query result on hive table through Spark JDBC, but it is
> working with spark HiveSQLContext. See complete scenario explain below.
>
> Help me understand the issue why Spark SQL JDBC is not giving result ?
>
> Below are version details.
>
> *Hive Version  : 1.2.1*
> *Hadoop Version :  2.6.0*
> *Spark version:  1.3.1*
>
> Let me know if need other details.
>
>
> *Created Hive Table , insert some records and query it :*
>
> *beeline> !connect jdbc:hive2://myhost:1*
> Enter username for jdbc:hive2://myhost:1: root
> Enter password for jdbc:hive2://myhost:1: **
> *beeline> create table tampTable(id int ,name string ) clustered by (id)
> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
> *beeline> insert into table tampTable values
> (1,'row1'),(2,'row2'),(3,'row3');*
> *beeline> select name from tampTable;*
> name
> -
> row1
> row3
> row2
>
> *Query through SparkSQL HiveSQLContext :*
>
> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
> SparkContext sc = new SparkContext(sparkConf);
> HiveContext hiveContext = new HiveContext(sc);
> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
> List teenagerNames = teenagers.toJavaRDD().map(new Function String>() {
>  @Override
>  public String call(Row row) {
>  return "Name: " + row.getString(0);
>  }
> }).collect();
> for (String name: teenagerNames) {
>  System.out.println(name);
> }
> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
> sc.stop();
>
> which is working perfectly and giving all names from table *tempTable*
>
> *Query through Spark SQL JDBC :*
>
> *beeline> !connect jdbc:hive2://myhost:*
> Enter username for jdbc:hive2://myhost:: root
> Enter password for jdbc:hive2://myhost:: **
> *beeline> show tables;*
> *temptable*
> *..other tables*
> beeline> *SELECT name FROM tampTable;*
>
> I can list the table through "show tables", but I run the query , it is
> either hanged or returns nothing.
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>