Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-28 Thread Kimoon Kim
Thanks for starting this discussion. When I was troubleshooting Spark on K8s, I often faced a need to turn on debug messages on the driver and executor pods of my jobs, which would be possible if I somehow put the right log4j.properties file inside the pods. I know I can build custom Docker

Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-22 Thread Matt Cheah
Re: Hadoop versioning – it seems reasonable enough for us to be publishing an image per Hadoop version. We should essentially have image configuration parity with what we publish as distributions on the Spark website. Sometimes jars need to be swapped out entirely instead of being strictly

Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-22 Thread Lalwani, Jayesh
I would like to add that many people run Spark behind corporate proxies. It’s very common to add http proxy to extraJavaOptions. Being able to provide custom extraJavaOption should be supported. Also, Hadoop FS 2.7.3 is pretty limited wrt S3 buckets. You cannot use temporary AWS tokens. You

Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-22 Thread Rob Vesse
The difficulty with a custom Spark config is that you need to be careful that the Spark config the user provides does not conflict with the auto-generated portions of the Spark config necessary to make Spark on K8S work.  So part of any “API” definition might need to be what Spark config is

Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-21 Thread Felix Cheung
I like being able to customize the docker image itself - but I realize this thread is more about “API” for the stock image. Environment is nice. Probably we need a way to set custom spark config (as a file??) From: Holden Karau Sent:

Re: Toward an "API" for spark images used by the Kubernetes back-end

2018-03-21 Thread Holden Karau
I’m glad this discussion is happening on dev@ :) Personally I like customizing with shell env variables during rolling my own image, but definitely documentation the expectations/usage of the variables is needed before we can really call it an API. On the related question I suspect two of the

Toward an "API" for spark images used by the Kubernetes back-end

2018-03-21 Thread Erik Erlandson
During the review of the recent PR to remove use of the init_container from kube pods as created by the Kubernetes back-end, the topic of documenting the "API" for these container images also came up. What information does the back-end provide to these containers? In what form? What assumptions