Unsubscribe

2019-01-18 Thread Aditya Gautam



Unsubscribe

2019-01-18 Thread Huy Banh



Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall...



From: Serega Sheypak 
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before 
running spark job?

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in 
advance?


Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Serega Sheypak
Hi, is there any possibility to tell Scheduler to blacklist specific nodes
in advance?


Rdd pipe Subprocess exit code

2019-01-18 Thread Mkal
When using rdd pipe(script), i get the following error :

"java.lang.IllegalStateException: Subprocess exited with status 132. Command
ran: "./script -h"

I'm getting this while trying to run my external script with a simple "-h"
argument to test that its running smoothly through my Spark code.
When i run it as i ultimately intend to, which means with many more flags, i
get same error but with exit status 1 instead of 132.

In the stacktrace, there is no mention of the error that actually happened
in the command.
Checking the executor logs (by yarn logs -applicationId ), i only see
what's already stated on the stacktrace.

Also note that the app is running correctly on standalone.
Anyone have any suggested course i should follow in solving this?>





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Question about RDD pipe

2019-01-18 Thread Mkal
Thanks a lot for the answer! It solved my problem.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: dataset best practice question

2019-01-18 Thread Mohit Jaggi
Thanks! I wanted to avoid repeating f1, f2, f3 in class B. I wonder whether
the encoders/decoders work if I use mixins

On Tue, Jan 15, 2019 at 7:57 PM  wrote:

> Hi Mohit,
>
>
>
> I’m not sure that there is a “correct” answer here, but I tend to use
> classes whenever the input or output data represents something meaningful
> (such as a domain model object). I would recommend against creating many
> temporary classes for each and every transformation step as that may be
> difficult to maintain over time.
>
>
>
> Using *withColumn* statements will continue to work, and you don’t need
> to cast to your output class until you’ve setup all tranformations.
> Therefore, you can do things like:
>
>
>
> case class A (f1, f2, f3)
>
> case class B (f1, f2, f3, f4, f5, f6)
>
>
>
> ds_a = spark.read.csv(“path”).as[A]
>
> ds_b = ds_a
>
>   .withColumn(“f4”, someUdf)
>
>   .withColumn(“f5”, someUdf)
>
>   .withColumn(“f6”, someUdf)
>
>   .as[B]
>
>
>
> Kevin
>
>
>
> *From:* Mohit Jaggi 
> *Sent:* Tuesday, January 15, 2019 1:31 PM
> *To:* user 
> *Subject:* dataset best practice question
>
>
>
> Fellow Spark Coders,
>
> I am trying to move from using Dataframes to Datasets for a reasonably
> large code base. Today the code looks like this:
>
>
>
> df_a= read_csv
>
> df_b = df.withColumn ( some_transform_that_adds_more_columns )
>
> //repeat the above several times
>
>
>
> With datasets, this will require defining
>
>
>
> case class A { f1, f2, f3 } //fields from csv file
>
> case class B { f1, f2, f3, f4 } //union of A and new field added by
> some_transform_that_adds_more_columns
>
> //repeat this 10 times
>
>
>
> Is there a better way?
>
>
>
> Mohit.
>


[SPARK ON K8]: How do you configure executors to use the keytab inside their image on Kubernetes?

2019-01-18 Thread pokemonmaster9505
Hi,



I’m attempting to use Spark on Kubernetes to connect to a Kerberized Hadoop
cluster. While I’m able to successfully connect to the company’s Hive
tables and run queries on them, I’ve only managed to do this on a single
driver pod (with no executors). If I use any executor pods, the process
fails because the executors are not authenticating themselves with the
keytab, returning a SIMPLE authentication error instead. This is surprising
because the executors are using the same image as the driver and should,
therefore, have the keytab and XML config files inside them. The driver is
able to do authenticate itself with the keytab because it’s running the
target JAR, which instructs it to do so. I can see that the executors are
not running processes from the JAR, but are instead running tasks have been
delegated by the driver. Please have a look at my stack overflow question
which contains all the details:



https://stackoverflow.com/questions/54181560/when-running-spark-on-kubernetes-to-access-kerberized-hadoop-cluster-how-do-you





My main references while trying to implement this architecture have been
the following:

   - https://github.com/apache/spark/blob/master/docs/security.md
   -
   
https://www.slideshare.net/Hadoop_Summit/running-secured-spark-job-in-kubernetes-compute-cluster-and-integrating-with-kerberized-hdfs
   -
   
https://www.iteblog.com/sparksummit2018/apache-spark-on-k8s-and-hdfs-security-with-ilan-flonenko-iteblog.pdf



Initially I attempted option 2 in the first link, but it just failed with
the error. I’ve also tried following the second and third link: I attempted
to pass the keytab as a secret in one of the config parameter in the
spark-submit job (as described here:
https://spark.apache.org/docs/latest/running-on-kubernetes.html), but
unfortunately this also returns the same error.



I would be grateful for any advice you can offer.



Thank you,

 Karan