hivethriftserver2 problems on upgrade to 1.6.0

2016-01-25 Thread james.gre...@baesystems.com
On upgrade from 1.5.0 to 1.6.0 I have a problem with the hivethriftserver2, I 
have this code:

val hiveContext = new HiveContext(SparkContext.getOrCreate(conf));

val thing = 
hiveContext.read.parquet("hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208")

thing.registerTempTable("thing")


HiveThriftServer2.startWithContext(hiveContext)


When I start things up on the cluster my hive-site.xml is found – I can see 
that the metastore connects:


INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083
INFO  metastore - Connected to metastore.


But then later on the thrift server seems not to connect to the remote hive 
metastore but to start a derby instance instead:

INFO  AbstractService - Service:CLIService is started.
INFO  ObjectStore - ObjectStore, initialize called
INFO  Query - Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is 
closing
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
INFO  HiveMetaStore - 0: get_databases: default
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=get_databases: 
default
INFO  HiveMetaStore - 0: Shutting down the object store...
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Shutting down the 
object store...
INFO  HiveMetaStore - 0: Metastore shutdown complete.
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Metastore shutdown 
complete.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.


So if I connect to this with JDBC I can see all the tables on the hive server – 
but not anything temporary – I guess they are going to derby.

I see someone on the databricks website is also having this problem.


Thanks

James






From: patcharee [mailto:patcharee.thong...@uni.no]
Sent: 25 January 2016 14:31
To: user@spark.apache.org
Cc: Eirik Thorsnes
Subject: streaming textFileStream problem - got only ONE line

Hi,

My streaming application is receiving data from file system and just prints the 
input count every 1 sec interval, as the code below:

val sparkConf = new SparkConf()
val ssc = new StreamingContext(sparkConf, Milliseconds(interval_ms))
val lines = ssc.textFileStream(args(0))
lines.count().print()

The problem is sometimes the data received from scc.textFileStream is ONLY ONE 
line. But in fact there are multiple lines in the new file found in that 
interval. See log below which shows three intervals. In the 2nd interval, the 
new file is: hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt. This 
file contains 6288 lines. The ssc.textFileStream returns ONLY ONE line (the 
header).

Any ideas/suggestions what the problem is?

-
SPARK LOG
-

16/01/25 15:11:11 INFO FileInputDStream: Cleared 1 old files that were older 
than 1453731011000 ms: 145373101 ms
16/01/25 15:11:11 INFO FileInputDStream: Cleared 0 old files that were older 
than 1453731011000 ms:
16/01/25 15:11:12 INFO FileInputDStream: Finding new files took 4 ms
16/01/25 15:11:12 INFO FileInputDStream: New files at time 1453731072000 ms:
hdfs://helmhdfs/user/patcharee/cerdata/datetime_19616.txt
---
Time: 1453731072000 ms
---
6288

16/01/25 15:11:12 INFO FileInputDStream: Cleared 1 old files that were older 
than 1453731012000 ms: 1453731011000 ms
16/01/25 15:11:12 INFO FileInputDStream: Cleared 0 old files that were older 
than 1453731012000 ms:
16/01/25 15:11:13 INFO FileInputDStream: Finding new files took 4 ms
16/01/25 15:11:13 INFO FileInputDStream: New files at time 1453731073000 ms:
hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt
---
Time: 1453731073000 ms
---
1

16/01/25 15:11:13 INFO FileInputDStream: Cleared 1 old files that were older 
than 1453731013000 ms: 1453731012000 ms
16/01/25 15:11:13 INFO FileInputDStream: Cleared 0 old files that were older 
than 1453731013000 ms:
16/01/25 15:11:14 INFO FileInputDStream: Finding new files took 3 ms
16/01/25 15:11:14 INFO FileInputDStream: New files at time 1453731074000 ms:
hdfs://helmhdfs/user/patcharee/cerdata/datetime_19618.txt
---
Time: 1453731074000 ms
---
6288


Thanks,
Patcharee
Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under 

Re: hivethriftserver2 problems on upgrade to 1.6.0

2016-01-27 Thread Deenar Toraskar
James

The problem you are facing is due to a feature introduced in Spark 1.6 -
multi-session mode, if you want to see temporary tables across session,
*set spark.sql.hive.thriftServer.singleSession=true*


   - From Spark 1.6, by default the Thrift server runs in multi-session
   mode. Which means each JDBC/ODBC connection owns a copy of their own SQL
   configuration and temporary function registry. Cached tables are still
   shared though. If you prefer to run the Thrift server in the old
   single-session mode, please set option
   spark.sql.hive.thriftServer.singleSession to true. You may either add
   this option to spark-defaults.conf, or pass it to start-thriftserver.sh
via --conf:

./sbin/start-thriftserver.sh \
 --conf spark.sql.hive.thriftServer.singleSession=true \
 ...


On 25 January 2016 at 15:06, james.gre...@baesystems.com <
james.gre...@baesystems.com> wrote:

> On upgrade from 1.5.0 to 1.6.0 I have a problem with the
> hivethriftserver2, I have this code:
>
>
>
> *val *hiveContext = *new *HiveContext(SparkContext.*getOrCreate*(conf));
>
> *val *thing = 
> hiveContext.read.parquet(*"hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208
> "*)
>
> thing.registerTempTable(*"thing"*)
>
>
>
> HiveThriftServer2.*startWithContext*(hiveContext)
>
>
>
>
>
> When I start things up on the cluster my hive-site.xml is found – I can
> see that the metastore connects:
>
>
>
>
>
> INFO  metastore - Trying to connect to metastore with URI thrift://
> dkclusterm2.imp.net:9083
>
> INFO  metastore - Connected to metastore.
>
>
>
>
>
> But then later on the thrift server seems not to connect to the remote
> hive metastore but to start a derby instance instead:
>
>
>
> INFO  AbstractService - Service:CLIService is started.
>
> INFO  ObjectStore - ObjectStore, initialize called
>
> INFO  Query - Reading in results for query
> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used
> is closing
>
> INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
>
> INFO  ObjectStore - Initialized ObjectStore
>
> INFO  HiveMetaStore - 0: get_databases: default
>
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=get_databases:
> default
>
> INFO  HiveMetaStore - 0: Shutting down the object store...
>
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Shutting down
> the object store...
>
> INFO  HiveMetaStore - 0: Metastore shutdown complete.
>
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Metastore
> shutdown complete.
>
> INFO  AbstractService - Service:ThriftBinaryCLIService is started.
>
> INFO  AbstractService - Service:HiveServer2 is started.
>
>
>
>
>
> So if I connect to this with JDBC I can see all the tables on the hive
> server – but not anything temporary – I guess they are going to derby.
>
>
>
> I see someone on the databricks website is also having this problem.
>
>
>
>
>
> Thanks
>
>
>
> James
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* patcharee [mailto:patcharee.thong...@uni.no]
> *Sent:* 25 January 2016 14:31
> *To:* user@spark.apache.org
> *Cc:* Eirik Thorsnes
> *Subject:* streaming textFileStream problem - got only ONE line
>
>
>
> Hi,
>
> My streaming application is receiving data from file system and just
> prints the input count every 1 sec interval, as the code below:
>
> * val *sparkConf = *new *SparkConf()
> * val *ssc = *new *StreamingContext(sparkConf, *Milliseconds*
> (interval_ms))
> * val *lines = ssc.textFileStream(args(0))
> lines.count().print()
>
> The problem is sometimes the data received from scc.textFileStream is ONLY
> ONE line. But in fact there are multiple lines in the new file found in
> that interval. See log below which shows three intervals. In the 2nd
> interval, the new file is:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt. This file
> contains 6288 lines. The ssc.textFileStream returns ONLY ONE line (the
> header).
>
> Any ideas/suggestions what the problem is?
>
>
> -
> SPARK LOG
>
> -
>
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731011000 ms: 145373101 ms
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731011000 ms:
> 16/01/25 15:11:12 INFO FileInputDStream: Finding new files took 4 ms
> 16/01/25 15:11:12 INFO FileInputDStream: New files at time 1453731072000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19616.txt
> ---
> Time: 1453731072000 ms
> ---
> 6288
>
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731012000 ms: 1453731011000 ms
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731012000 ms:
>