Re: unsubscribe

2020-01-29 Thread Puneet Saha
unsubscribe

On Fri, Jan 17, 2020 at 11:39 AM Bruno S. de Barros 
wrote:

>
>
>
>
>
>
>
> - To
> unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: Service Account not being honored using pyspark on Kubernetes

2020-01-29 Thread pisymbol .
On Wed, Jan 29, 2020 at 9:58 PM pisymbol .  wrote:

>
>
> On Wed, Jan 29, 2020 at 5:02 PM pisymbol .  wrote:
>
>>
>> The problem is when spark initailizes I see the following error:
>>
>> io.fabric8.kubernetes.client.KubernetesClientException: pods is forbidden:
>> User "system:serviceaccount:default:default" cannot watch resource "pods"
>> in
>> API group "" in the namespace "spark"
>>
>>
> If I deploy my "driver" notebook pod in the spark namespace then things
> improve slightly:
>
> " Forbidden!Configured service account doesn't have access. Service
> account may have been revoked. pods is forbidden: User
> "system:serviceaccount:spark:default" cannot list resource "pods"
>
> Again, I don't want spark:default I want spark:spark for the service
> account. Why aren't my configuration parameters taking?
>

For the pour soul that reads this thread and runs into the same issue, the
fix is to set the serviceAccount in your deployment for the pod to "spark".
I'm not sure why this has to be done but it works.

-aps


Re: Service Account not being honored using pyspark on Kubernetes

2020-01-29 Thread pisymbol .
On Wed, Jan 29, 2020 at 5:02 PM pisymbol .  wrote:

>
> The problem is when spark initailizes I see the following error:
>
> io.fabric8.kubernetes.client.KubernetesClientException: pods is forbidden:
> User "system:serviceaccount:default:default" cannot watch resource "pods"
> in
> API group "" in the namespace "spark"
>
>
If I deploy my "driver" notebook pod in the spark namespace then things
improve slightly:

" Forbidden!Configured service account doesn't have access. Service account
may have been revoked. pods is forbidden: User
"system:serviceaccount:spark:default" cannot list resource "pods"

Again, I don't want spark:default I want spark:spark for the service
account. Why aren't my configuration parameters taking?

-aps


Re: Re: union two pyspark dataframes from different SparkSessions

2020-01-29 Thread Zong-han, Xie
Dear Yeikel

I checked my code and it uses getOrCreate to create a SparkSession.
Therefore, I should be retrieving the same SparkSession instance everytime I
call that method.

Thanks for your reminding.

Best regard



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Service Account not being honored using pyspark on Kubernetes

2020-01-29 Thread pisymbol .
I am on k8s 1.17 in a small 4 node cluster. I am running Spark 2.4.4 but
with
updated kubernetes-client jars to work around the 403 CVE issue.

I am running on a pod in the 'default' namespace of my cluster in a Jupyter
notebook. I am trying to configure 'client mode' so I can use pyspark
interactively and watch work done on the executors.

Here is my SparkConf:

sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://192.168.0.100:6443;)
sparkConf.setAppName("pispark")
sparkConf.set("spark.kubernetes.container.image",
"pidocker-docker-registry:5000/my-spark-py:v2.4.4")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "3")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", 3)
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName",
"spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.kubernetes.pullSecrets",
"pidocker-docker-registry-secret")

spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext

The problem is when spark initailizes I see the following error:

io.fabric8.kubernetes.client.KubernetesClientException: pods is forbidden:
User "system:serviceaccount:default:default" cannot watch resource "pods" in
API group "" in the namespace "spark"

But I am not using "default:default" I am using "spark:spark" which has
"edit" access via a clusterrolebinding in that namespace:

$  k describe clusterrolebinding/spark-role -n spark
Name: spark-role
Labels:   
Annotations:  
Role:
  Kind:  ClusterRole
  Name:  edit
Subjects:
  KindName   Namespace
     -
  ServiceAccount  spark  spark

What am I doing wrong?

-aps


Re: union two pyspark dataframes from different SparkSessions

2020-01-29 Thread yeikel valdes
>From what I understand, the session is a singleton so even if you think you 
>are creating new instances you are just reusing it. 




 On Wed, 29 Jan 2020 02:24:05 -1100 icbm0...@gmail.com wrote 


Dear all

I already had a python function which is used to query data from HBase and
HDFS with given parameters. This function returns a pyspark dataframe and
the SparkContext it used.

With client's increasing demands, I need to merge data from multiple query.
I tested using "union" function to merge the pyspark dataframes returned by
different function calls directly and it worked. This surprised me that
pyspark dataframe can actually union dataframes from different SparkSession.

I am using pyspark 2.3.1 and Python 3.5.

I wonder if this is a good practice or I better use same SparkSession for
all the query?

Best regards



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



union two pyspark dataframes from different SparkSessions

2020-01-29 Thread Zong-han, Xie
Dear all

I already had a python function which is used to query data from HBase and
HDFS with given parameters. This function returns a pyspark dataframe and
the SparkContext it used. 

With client's increasing demands, I need to merge data from multiple query.
I tested using "union" function to merge the pyspark dataframes returned by
different function calls directly and it worked. This surprised me that
pyspark dataframe can actually union dataframes from different SparkSession. 

I am using pyspark 2.3.1 and Python 3.5.

I wonder if this is a good practice or I better use same SparkSession for
all the query?

Best regards



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Problems during upgrade 2.2.2 -> 2.4.4

2020-01-29 Thread bsikander
Anyone?
This question is not regarding my application running on top of Spark.
The question is about the upgrade of spark itself from 2.2 to 2.4.

I expected atleast that spark would recover from upgrades gracefully and
recover its own persisted objects.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org