Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-02 Thread Takeshi Yamamuro
This is probably because the current thrift-server implementation has
`SparkContext` inside
(See:
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34
).
To support yarn-cluster, we need to add a lots of functionalities to deploy
the thrift-server itself in a cluster.
However, istm there are many technical issues around this.

// maropu

On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomov  wrote:

> What about yarn-cluster mode?
>
> 2016-07-01 11:24 GMT-07:00 Egor Pahomov :
>
>> Separate bad users with bad quires from good users with good quires.
>> Spark do not provide no scope separation out of the box.
>>
>> 2016-07-01 11:12 GMT-07:00 Jeff Zhang :
>>
>>> I think so, any reason you want to deploy multiple thrift server on one
>>> machine ?
>>>
>>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov 
>>> wrote:
>>>
 Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
 Jeff, thanks, I would try, but from your answer I'm getting the
 feeling, that I'm trying some very rare case?

 2016-07-01 10:54 GMT-07:00 Jeff Zhang :

> This is not a bug, because these 2 processes use the
> same SPARK_PID_DIR which is /tmp by default.  Although you can resolve 
> this
> by using different SPARK_PID_DIR, I suspect you would still have other
> issues like port conflict. I would suggest you to deploy one spark thrift
> server per machine for now. If stick to deploy multiple spark thrift 
> server
> on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
> there's other conflicts. but please try first.
>
>
> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
> wrote:
>
>> I get
>>
>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
>> process 28989.  Stop it first."
>>
>> Is it a bug?
>>
>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>>
>>> I don't think the one instance per machine is true.  As long as you
>>> resolve the conflict issue such as port conflict, pid file, log file and
>>> etc, you can run multiple instances of spark thrift server.
>>>
>>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> > wrote:
>>>
 Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
 bother me -

 1) One instance per machine
 2) Yarn client only(not yarn cluster)

 Are there any architectural reasons for such limitations? About
 yarn-client I might understand in theory - master is the same process 
 as a
 server, so it makes some sense, but it's really inconvenient - I need 
 a lot
 of memory on my driver machine. Reasons for one instance per machine I 
 do
 not understand.

 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
What about yarn-cluster mode?

2016-07-01 11:24 GMT-07:00 Egor Pahomov :

> Separate bad users with bad quires from good users with good quires. Spark
> do not provide no scope separation out of the box.
>
> 2016-07-01 11:12 GMT-07:00 Jeff Zhang :
>
>> I think so, any reason you want to deploy multiple thrift server on one
>> machine ?
>>
>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov 
>> wrote:
>>
>>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
>>> Jeff, thanks, I would try, but from your answer I'm getting the feeling,
>>> that I'm trying some very rare case?
>>>
>>> 2016-07-01 10:54 GMT-07:00 Jeff Zhang :
>>>
 This is not a bug, because these 2 processes use the same SPARK_PID_DIR
 which is /tmp by default.  Although you can resolve this by using
 different SPARK_PID_DIR, I suspect you would still have other issues like
 port conflict. I would suggest you to deploy one spark thrift server per
 machine for now. If stick to deploy multiple spark thrift server on one
 machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
 SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
 there's other conflicts. but please try first.


 On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
 wrote:

> I get
>
> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
> process 28989.  Stop it first."
>
> Is it a bug?
>
> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>
>> I don't think the one instance per machine is true.  As long as you
>> resolve the conflict issue such as port conflict, pid file, log file and
>> etc, you can run multiple instances of spark thrift server.
>>
>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
>> wrote:
>>
>>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>>> bother me -
>>>
>>> 1) One instance per machine
>>> 2) Yarn client only(not yarn cluster)
>>>
>>> Are there any architectural reasons for such limitations? About
>>> yarn-client I might understand in theory - master is the same process 
>>> as a
>>> server, so it makes some sense, but it's really inconvenient - I need a 
>>> lot
>>> of memory on my driver machine. Reasons for one instance per machine I 
>>> do
>>> not understand.
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



 --
 Best Regards

 Jeff Zhang

>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 


*Sincerely yoursEgor Pakhomov*


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
Separate bad users with bad quires from good users with good quires. Spark
do not provide no scope separation out of the box.

2016-07-01 11:12 GMT-07:00 Jeff Zhang :

> I think so, any reason you want to deploy multiple thrift server on one
> machine ?
>
> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov 
> wrote:
>
>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
>> Jeff, thanks, I would try, but from your answer I'm getting the feeling,
>> that I'm trying some very rare case?
>>
>> 2016-07-01 10:54 GMT-07:00 Jeff Zhang :
>>
>>> This is not a bug, because these 2 processes use the same SPARK_PID_DIR
>>> which is /tmp by default.  Although you can resolve this by using
>>> different SPARK_PID_DIR, I suspect you would still have other issues like
>>> port conflict. I would suggest you to deploy one spark thrift server per
>>> machine for now. If stick to deploy multiple spark thrift server on one
>>> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
>>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
>>> there's other conflicts. but please try first.
>>>
>>>
>>> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
>>> wrote:
>>>
 I get

 "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
 process 28989.  Stop it first."

 Is it a bug?

 2016-07-01 10:10 GMT-07:00 Jeff Zhang :

> I don't think the one instance per machine is true.  As long as you
> resolve the conflict issue such as port conflict, pid file, log file and
> etc, you can run multiple instances of spark thrift server.
>
> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
> wrote:
>
>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>> bother me -
>>
>> 1) One instance per machine
>> 2) Yarn client only(not yarn cluster)
>>
>> Are there any architectural reasons for such limitations? About
>> yarn-client I might understand in theory - master is the same process as 
>> a
>> server, so it makes some sense, but it's really inconvenient - I need a 
>> lot
>> of memory on my driver machine. Reasons for one instance per machine I do
>> not understand.
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 


*Sincerely yoursEgor Pakhomov*


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Jeff Zhang
I think so, any reason you want to deploy multiple thrift server on one
machine ?

On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov 
wrote:

> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
> Jeff, thanks, I would try, but from your answer I'm getting the feeling,
> that I'm trying some very rare case?
>
> 2016-07-01 10:54 GMT-07:00 Jeff Zhang :
>
>> This is not a bug, because these 2 processes use the same SPARK_PID_DIR
>> which is /tmp by default.  Although you can resolve this by using
>> different SPARK_PID_DIR, I suspect you would still have other issues like
>> port conflict. I would suggest you to deploy one spark thrift server per
>> machine for now. If stick to deploy multiple spark thrift server on one
>> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
>> there's other conflicts. but please try first.
>>
>>
>> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
>> wrote:
>>
>>> I get
>>>
>>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
>>> process 28989.  Stop it first."
>>>
>>> Is it a bug?
>>>
>>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>>>
 I don't think the one instance per machine is true.  As long as you
 resolve the conflict issue such as port conflict, pid file, log file and
 etc, you can run multiple instances of spark thrift server.

 On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
 wrote:

> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
> bother me -
>
> 1) One instance per machine
> 2) Yarn client only(not yarn cluster)
>
> Are there any architectural reasons for such limitations? About
> yarn-client I might understand in theory - master is the same process as a
> server, so it makes some sense, but it's really inconvenient - I need a 
> lot
> of memory on my driver machine. Reasons for one instance per machine I do
> not understand.
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



 --
 Best Regards

 Jeff Zhang

>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
Best Regards

Jeff Zhang


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
Jeff, thanks, I would try, but from your answer I'm getting the feeling,
that I'm trying some very rare case?

2016-07-01 10:54 GMT-07:00 Jeff Zhang :

> This is not a bug, because these 2 processes use the same SPARK_PID_DIR
> which is /tmp by default.  Although you can resolve this by using
> different SPARK_PID_DIR, I suspect you would still have other issues like
> port conflict. I would suggest you to deploy one spark thrift server per
> machine for now. If stick to deploy multiple spark thrift server on one
> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
> there's other conflicts. but please try first.
>
>
> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
> wrote:
>
>> I get
>>
>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
>> process 28989.  Stop it first."
>>
>> Is it a bug?
>>
>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>>
>>> I don't think the one instance per machine is true.  As long as you
>>> resolve the conflict issue such as port conflict, pid file, log file and
>>> etc, you can run multiple instances of spark thrift server.
>>>
>>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
>>> wrote:
>>>
 Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
 bother me -

 1) One instance per machine
 2) Yarn client only(not yarn cluster)

 Are there any architectural reasons for such limitations? About
 yarn-client I might understand in theory - master is the same process as a
 server, so it makes some sense, but it's really inconvenient - I need a lot
 of memory on my driver machine. Reasons for one instance per machine I do
 not understand.

 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 


*Sincerely yoursEgor Pakhomov*


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Jeff Zhang
This is not a bug, because these 2 processes use the same SPARK_PID_DIR
which is /tmp by default.  Although you can resolve this by using
different SPARK_PID_DIR, I suspect you would still have other issues like
port conflict. I would suggest you to deploy one spark thrift server per
machine for now. If stick to deploy multiple spark thrift server on one
machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
there's other conflicts. but please try first.


On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
wrote:

> I get
>
> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
> process 28989.  Stop it first."
>
> Is it a bug?
>
> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>
>> I don't think the one instance per machine is true.  As long as you
>> resolve the conflict issue such as port conflict, pid file, log file and
>> etc, you can run multiple instances of spark thrift server.
>>
>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
>> wrote:
>>
>>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>>> bother me -
>>>
>>> 1) One instance per machine
>>> 2) Yarn client only(not yarn cluster)
>>>
>>> Are there any architectural reasons for such limitations? About
>>> yarn-client I might understand in theory - master is the same process as a
>>> server, so it makes some sense, but it's really inconvenient - I need a lot
>>> of memory on my driver machine. Reasons for one instance per machine I do
>>> not understand.
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
Best Regards

Jeff Zhang


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Takeshi Yamamuro
As said earlier, how about changing a bound port by using env
`HIVE_SERVER2_THRIFT_PORT`?

// maropu

On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
wrote:

> I get
>
> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
> process 28989.  Stop it first."
>
> Is it a bug?
>
> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>
>> I don't think the one instance per machine is true.  As long as you
>> resolve the conflict issue such as port conflict, pid file, log file and
>> etc, you can run multiple instances of spark thrift server.
>>
>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
>> wrote:
>>
>>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>>> bother me -
>>>
>>> 1) One instance per machine
>>> 2) Yarn client only(not yarn cluster)
>>>
>>> Are there any architectural reasons for such limitations? About
>>> yarn-client I might understand in theory - master is the same process as a
>>> server, so it makes some sense, but it's really inconvenient - I need a lot
>>> of memory on my driver machine. Reasons for one instance per machine I do
>>> not understand.
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
I get

"org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
process 28989.  Stop it first."

Is it a bug?

2016-07-01 10:10 GMT-07:00 Jeff Zhang :

> I don't think the one instance per machine is true.  As long as you
> resolve the conflict issue such as port conflict, pid file, log file and
> etc, you can run multiple instances of spark thrift server.
>
> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov 
> wrote:
>
>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>> bother me -
>>
>> 1) One instance per machine
>> 2) Yarn client only(not yarn cluster)
>>
>> Are there any architectural reasons for such limitations? About
>> yarn-client I might understand in theory - master is the same process as a
>> server, so it makes some sense, but it's really inconvenient - I need a lot
>> of memory on my driver machine. Reasons for one instance per machine I do
>> not understand.
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 


*Sincerely yoursEgor Pakhomov*


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Jeff Zhang
I don't think the one instance per machine is true.  As long as you resolve
the conflict issue such as port conflict, pid file, log file and etc, you
can run multiple instances of spark thrift server.

On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov  wrote:

> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother
> me -
>
> 1) One instance per machine
> 2) Yarn client only(not yarn cluster)
>
> Are there any architectural reasons for such limitations? About
> yarn-client I might understand in theory - master is the same process as a
> server, so it makes some sense, but it's really inconvenient - I need a lot
> of memory on my driver machine. Reasons for one instance per machine I do
> not understand.
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
Best Regards

Jeff Zhang


Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother
me -

1) One instance per machine
2) Yarn client only(not yarn cluster)

Are there any architectural reasons for such limitations? About yarn-client
I might understand in theory - master is the same process as a server, so
it makes some sense, but it's really inconvenient - I need a lot of memory
on my driver machine. Reasons for one instance per machine I do not
understand.

-- 


*Sincerely yoursEgor Pakhomov*