Re: Access multiple cluster

2016-12-05 Thread Steve Loughran
if the remote filesystem is visible from the other, than a different HDFS 
value, e.g hdfs://analytics:8000/historical/  can be used for reads & writes, 
even if your defaultFS (the one where you get max performance) is, say 
hdfs://processing:8000/

-performance will be slower, in both directions
-if you have a fast pipe between the two clusters, then a job with many 
executors may unintentionally saturate the network, leading to unhappy people 
elsewhere.
-you'd better have mutual trust at the kerberos layer. There's a configuration 
option (I forget its name) to give spark-submit a list of hdfs namenodes it 
will need to get tokens from. Unless your spark cluster is being launched with 
keytabs, you will need to list upfront all hdfs clusters your job intends to 
work with

On 4 Dec 2016, at 21:45, ayan guha 
> wrote:


Hi

Is it possible to access hive tables sitting on multiple clusters in a single 
spark application?

We have a data processing cluster and analytics cluster. I want to join a table 
from analytics cluster with another table in processing cluster and finally 
write back in analytics cluster.

Best
Ayan



Re: Access multiple cluster

2016-12-04 Thread ayan guha
Thank you guys. I will try JDBC route if I get access and let you know.

On Mon, Dec 5, 2016 at 5:17 PM, Jörn Franke  wrote:

> If you do it frequently then you may simply copy the data to the
> processing cluster. Alternatively, you could create an external table in
> the processing cluster to the analytics cluster. However, this has to be
> supported by appropriate security configuration and might be less an
> efficient then copying the data.
>
> On 4 Dec 2016, at 22:45, ayan guha  wrote:
>
> Hi
>
> Is it possible to access hive tables sitting on multiple clusters in a
> single spark application?
>
> We have a data processing cluster and analytics cluster. I want to join a
> table from analytics cluster with another table in processing cluster and
> finally write back in analytics cluster.
>
> Best
> Ayan
>
>


-- 
Best Regards,
Ayan Guha


Re: Access multiple cluster

2016-12-04 Thread Jörn Franke
If you do it frequently then you may simply copy the data to the processing 
cluster. Alternatively, you could create an external table in the processing 
cluster to the analytics cluster. However, this has to be supported by 
appropriate security configuration and might be less an efficient then copying 
the data.

> On 4 Dec 2016, at 22:45, ayan guha  wrote:
> 
> Hi
> 
> Is it possible to access hive tables sitting on multiple clusters in a single 
> spark application?
> 
> We have a data processing cluster and analytics cluster. I want to join a 
> table from analytics cluster with another table in processing cluster and 
> finally write back in analytics cluster.
> 
> Best
> Ayan


Re: Access multiple cluster

2016-12-04 Thread Mich Talebzadeh
The only way I think of would be accessing Hive tables through their
respective thrift servers running on different clusters but not sure you
can do it within Spark. Basically two different JDBC connections.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 December 2016 at 21:45, ayan guha  wrote:

> Hi
>
> Is it possible to access hive tables sitting on multiple clusters in a
> single spark application?
>
> We have a data processing cluster and analytics cluster. I want to join a
> table from analytics cluster with another table in processing cluster and
> finally write back in analytics cluster.
>
> Best
> Ayan
>


Access multiple cluster

2016-12-04 Thread ayan guha
Hi

Is it possible to access hive tables sitting on multiple clusters in a
single spark application?

We have a data processing cluster and analytics cluster. I want to join a
table from analytics cluster with another table in processing cluster and
finally write back in analytics cluster.

Best
Ayan