Re: Thrift JDBC server - why only one per machine and only yarn-client
This is probably because the current thrift-server implementation has `SparkContext` inside (See: https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34 ). To support yarn-cluster, we need to add a lots of functionalities to deploy the thrift-server itself in a cluster. However, istm there are many technical issues around this. // maropu On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomovwrote: > What about yarn-cluster mode? > > 2016-07-01 11:24 GMT-07:00 Egor Pahomov : > >> Separate bad users with bad quires from good users with good quires. >> Spark do not provide no scope separation out of the box. >> >> 2016-07-01 11:12 GMT-07:00 Jeff Zhang : >> >>> I think so, any reason you want to deploy multiple thrift server on one >>> machine ? >>> >>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov >>> wrote: >>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT Jeff, thanks, I would try, but from your answer I'm getting the feeling, that I'm trying some very rare case? 2016-07-01 10:54 GMT-07:00 Jeff Zhang : > This is not a bug, because these 2 processes use the > same SPARK_PID_DIR which is /tmp by default. Although you can resolve > this > by using different SPARK_PID_DIR, I suspect you would still have other > issues like port conflict. I would suggest you to deploy one spark thrift > server per machine for now. If stick to deploy multiple spark thrift > server > on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and > SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if > there's other conflicts. but please try first. > > > On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov > wrote: > >> I get >> >> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as >> process 28989. Stop it first." >> >> Is it a bug? >> >> 2016-07-01 10:10 GMT-07:00 Jeff Zhang : >> >>> I don't think the one instance per machine is true. As long as you >>> resolve the conflict issue such as port conflict, pid file, log file and >>> etc, you can run multiple instances of spark thrift server. >>> >>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> > wrote: >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so it makes some sense, but it's really inconvenient - I need a lot of memory on my driver machine. Reasons for one instance per machine I do not understand. -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- --- Takeshi Yamamuro
Re: Thrift JDBC server - why only one per machine and only yarn-client
What about yarn-cluster mode? 2016-07-01 11:24 GMT-07:00 Egor Pahomov: > Separate bad users with bad quires from good users with good quires. Spark > do not provide no scope separation out of the box. > > 2016-07-01 11:12 GMT-07:00 Jeff Zhang : > >> I think so, any reason you want to deploy multiple thrift server on one >> machine ? >> >> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov >> wrote: >> >>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT >>> Jeff, thanks, I would try, but from your answer I'm getting the feeling, >>> that I'm trying some very rare case? >>> >>> 2016-07-01 10:54 GMT-07:00 Jeff Zhang : >>> This is not a bug, because these 2 processes use the same SPARK_PID_DIR which is /tmp by default. Although you can resolve this by using different SPARK_PID_DIR, I suspect you would still have other issues like port conflict. I would suggest you to deploy one spark thrift server per machine for now. If stick to deploy multiple spark thrift server on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if there's other conflicts. but please try first. On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov wrote: > I get > > "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as > process 28989. Stop it first." > > Is it a bug? > > 2016-07-01 10:10 GMT-07:00 Jeff Zhang : > >> I don't think the one instance per machine is true. As long as you >> resolve the conflict issue such as port conflict, pid file, log file and >> etc, you can run multiple instances of spark thrift server. >> >> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> wrote: >> >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >>> bother me - >>> >>> 1) One instance per machine >>> 2) Yarn client only(not yarn cluster) >>> >>> Are there any architectural reasons for such limitations? About >>> yarn-client I might understand in theory - master is the same process >>> as a >>> server, so it makes some sense, but it's really inconvenient - I need a >>> lot >>> of memory on my driver machine. Reasons for one instance per machine I >>> do >>> not understand. >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- Best Regards Jeff Zhang >>> >>> >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- *Sincerely yoursEgor Pakhomov*
Re: Thrift JDBC server - why only one per machine and only yarn-client
Separate bad users with bad quires from good users with good quires. Spark do not provide no scope separation out of the box. 2016-07-01 11:12 GMT-07:00 Jeff Zhang: > I think so, any reason you want to deploy multiple thrift server on one > machine ? > > On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov > wrote: > >> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT >> Jeff, thanks, I would try, but from your answer I'm getting the feeling, >> that I'm trying some very rare case? >> >> 2016-07-01 10:54 GMT-07:00 Jeff Zhang : >> >>> This is not a bug, because these 2 processes use the same SPARK_PID_DIR >>> which is /tmp by default. Although you can resolve this by using >>> different SPARK_PID_DIR, I suspect you would still have other issues like >>> port conflict. I would suggest you to deploy one spark thrift server per >>> machine for now. If stick to deploy multiple spark thrift server on one >>> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and >>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if >>> there's other conflicts. but please try first. >>> >>> >>> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov >>> wrote: >>> I get "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 28989. Stop it first." Is it a bug? 2016-07-01 10:10 GMT-07:00 Jeff Zhang : > I don't think the one instance per machine is true. As long as you > resolve the conflict issue such as port conflict, pid file, log file and > etc, you can run multiple instances of spark thrift server. > > On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov > wrote: > >> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >> bother me - >> >> 1) One instance per machine >> 2) Yarn client only(not yarn cluster) >> >> Are there any architectural reasons for such limitations? About >> yarn-client I might understand in theory - master is the same process as >> a >> server, so it makes some sense, but it's really inconvenient - I need a >> lot >> of memory on my driver machine. Reasons for one instance per machine I do >> not understand. >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov*
Re: Thrift JDBC server - why only one per machine and only yarn-client
I think so, any reason you want to deploy multiple thrift server on one machine ? On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomovwrote: > Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT > Jeff, thanks, I would try, but from your answer I'm getting the feeling, > that I'm trying some very rare case? > > 2016-07-01 10:54 GMT-07:00 Jeff Zhang : > >> This is not a bug, because these 2 processes use the same SPARK_PID_DIR >> which is /tmp by default. Although you can resolve this by using >> different SPARK_PID_DIR, I suspect you would still have other issues like >> port conflict. I would suggest you to deploy one spark thrift server per >> machine for now. If stick to deploy multiple spark thrift server on one >> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and >> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if >> there's other conflicts. but please try first. >> >> >> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov >> wrote: >> >>> I get >>> >>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as >>> process 28989. Stop it first." >>> >>> Is it a bug? >>> >>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang : >>> I don't think the one instance per machine is true. As long as you resolve the conflict issue such as port conflict, pid file, log file and etc, you can run multiple instances of spark thrift server. On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov wrote: > Hi, I'm using Spark Thrift JDBC server and 2 limitations are really > bother me - > > 1) One instance per machine > 2) Yarn client only(not yarn cluster) > > Are there any architectural reasons for such limitations? About > yarn-client I might understand in theory - master is the same process as a > server, so it makes some sense, but it's really inconvenient - I need a > lot > of memory on my driver machine. Reasons for one instance per machine I do > not understand. > > -- > > > *Sincerely yoursEgor Pakhomov* > -- Best Regards Jeff Zhang >>> >>> >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- Best Regards Jeff Zhang
Re: Thrift JDBC server - why only one per machine and only yarn-client
Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT Jeff, thanks, I would try, but from your answer I'm getting the feeling, that I'm trying some very rare case? 2016-07-01 10:54 GMT-07:00 Jeff Zhang: > This is not a bug, because these 2 processes use the same SPARK_PID_DIR > which is /tmp by default. Although you can resolve this by using > different SPARK_PID_DIR, I suspect you would still have other issues like > port conflict. I would suggest you to deploy one spark thrift server per > machine for now. If stick to deploy multiple spark thrift server on one > machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and > SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if > there's other conflicts. but please try first. > > > On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov > wrote: > >> I get >> >> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as >> process 28989. Stop it first." >> >> Is it a bug? >> >> 2016-07-01 10:10 GMT-07:00 Jeff Zhang : >> >>> I don't think the one instance per machine is true. As long as you >>> resolve the conflict issue such as port conflict, pid file, log file and >>> etc, you can run multiple instances of spark thrift server. >>> >>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >>> wrote: >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so it makes some sense, but it's really inconvenient - I need a lot of memory on my driver machine. Reasons for one instance per machine I do not understand. -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov*
Re: Thrift JDBC server - why only one per machine and only yarn-client
This is not a bug, because these 2 processes use the same SPARK_PID_DIR which is /tmp by default. Although you can resolve this by using different SPARK_PID_DIR, I suspect you would still have other issues like port conflict. I would suggest you to deploy one spark thrift server per machine for now. If stick to deploy multiple spark thrift server on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if there's other conflicts. but please try first. On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomovwrote: > I get > > "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as > process 28989. Stop it first." > > Is it a bug? > > 2016-07-01 10:10 GMT-07:00 Jeff Zhang : > >> I don't think the one instance per machine is true. As long as you >> resolve the conflict issue such as port conflict, pid file, log file and >> etc, you can run multiple instances of spark thrift server. >> >> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> wrote: >> >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >>> bother me - >>> >>> 1) One instance per machine >>> 2) Yarn client only(not yarn cluster) >>> >>> Are there any architectural reasons for such limitations? About >>> yarn-client I might understand in theory - master is the same process as a >>> server, so it makes some sense, but it's really inconvenient - I need a lot >>> of memory on my driver machine. Reasons for one instance per machine I do >>> not understand. >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- Best Regards Jeff Zhang
Re: Thrift JDBC server - why only one per machine and only yarn-client
As said earlier, how about changing a bound port by using env `HIVE_SERVER2_THRIFT_PORT`? // maropu On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomovwrote: > I get > > "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as > process 28989. Stop it first." > > Is it a bug? > > 2016-07-01 10:10 GMT-07:00 Jeff Zhang : > >> I don't think the one instance per machine is true. As long as you >> resolve the conflict issue such as port conflict, pid file, log file and >> etc, you can run multiple instances of spark thrift server. >> >> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> wrote: >> >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >>> bother me - >>> >>> 1) One instance per machine >>> 2) Yarn client only(not yarn cluster) >>> >>> Are there any architectural reasons for such limitations? About >>> yarn-client I might understand in theory - master is the same process as a >>> server, so it makes some sense, but it's really inconvenient - I need a lot >>> of memory on my driver machine. Reasons for one instance per machine I do >>> not understand. >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- --- Takeshi Yamamuro
Re: Thrift JDBC server - why only one per machine and only yarn-client
I get "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 28989. Stop it first." Is it a bug? 2016-07-01 10:10 GMT-07:00 Jeff Zhang: > I don't think the one instance per machine is true. As long as you > resolve the conflict issue such as port conflict, pid file, log file and > etc, you can run multiple instances of spark thrift server. > > On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov > wrote: > >> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >> bother me - >> >> 1) One instance per machine >> 2) Yarn client only(not yarn cluster) >> >> Are there any architectural reasons for such limitations? About >> yarn-client I might understand in theory - master is the same process as a >> server, so it makes some sense, but it's really inconvenient - I need a lot >> of memory on my driver machine. Reasons for one instance per machine I do >> not understand. >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov*
Re: Thrift JDBC server - why only one per machine and only yarn-client
I don't think the one instance per machine is true. As long as you resolve the conflict issue such as port conflict, pid file, log file and etc, you can run multiple instances of spark thrift server. On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomovwrote: > Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother > me - > > 1) One instance per machine > 2) Yarn client only(not yarn cluster) > > Are there any architectural reasons for such limitations? About > yarn-client I might understand in theory - master is the same process as a > server, so it makes some sense, but it's really inconvenient - I need a lot > of memory on my driver machine. Reasons for one instance per machine I do > not understand. > > -- > > > *Sincerely yoursEgor Pakhomov* > -- Best Regards Jeff Zhang
Thrift JDBC server - why only one per machine and only yarn-client
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so it makes some sense, but it's really inconvenient - I need a lot of memory on my driver machine. Reasons for one instance per machine I do not understand. -- *Sincerely yoursEgor Pakhomov*