Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Yinan Li
Yes, the PR allows you to set say 1.5. The New configuration property defaults to spark.executor.cores, which defaults to 1. On Fri, Mar 30, 2018, 3:03 PM Kimoon Kim wrote: > David, glad it helped! And thanks for your clear example. > > > The only remaining question would

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Kimoon Kim
David, glad it helped! And thanks for your clear example. > The only remaining question would then be what a sensible default for *spark.kubernetes.executor.cores *would be. Seeing that I wanted more than 1 and Yinan wants less, leaving it at 1 night be best. 1 as default SGTM. Thanks, Kimoon

Re: DataSourceV2 write input requirements

2018-03-30 Thread Ted Yu
+1 Original message From: Ryan Blue Date: 3/30/18 2:28 PM (GMT-08:00) To: Patrick Woody Cc: Russell Spitzer , Wenchen Fan , Ted Yu , Spark Dev List

Re: DataSourceV2 write input requirements

2018-03-30 Thread Ryan Blue
You're right. A global sort would change the clustering if it had more fields than the clustering. Then what about this: if there is no RequiredClustering, then the sort is a global sort. If RequiredClustering is present, then the clustering is applied and the sort is a partition-level sort.

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread David Vogelbacher
Thanks for linking that PR Kimoon. It actually does mostly address the issue I was referring to. As the issue I linked in my first email states, one physical cpu might not be enough to execute a task in a performant way. So if I set spark.executor.cores=1 and spark.task.cpus=1 , I will get

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Kimoon Kim
I see. Good to learn the interaction between spark.task.cpus and spark.executor.cores. But am I right to say that PR #20553 can be still used as an additional knob on top of those two? Say a user wants 1.5 core per executor from Kubernetes, not the rounded up integer value 2? > A relevant

Re: DataSourceV2 write input requirements

2018-03-30 Thread Patrick Woody
Does that methodology work in this specific case? The ordering must be a subset of the clustering to guarantee they exist in the same partition when doing a global sort I thought. Though I get the gist that if it does satisfy, then there is no reason to not choose the global sort. On Fri, Mar 30,

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Yinan Li
PR #20553 is more for allowing users to use a fractional value for cpu requests. The existing spark.executor.cores is sufficient for specifying more than one cpus. > One way to solve this could be to request more than 1 core from Kubernetes per task.

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Kimoon Kim
> Instead of requesting `[driver,executor].memory`, we should just request `[driver,executor].memory + [driver,executor].memoryOverhead `. I think this case is a bit clearer than the CPU case, so I went ahead and filed an issue with more details

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Matt Cheah
The question is more so generally what an advised best practice is for setting CPU limits. It’s not immediately clear what a correct value is for setting CPU limits if one wants to provide guarantees for consistent / guaranteed execution performance while also not degrading performance.

Re: DataSourceV2 write input requirements

2018-03-30 Thread Ryan Blue
> Can you expand on how the ordering containing the clustering expressions would ensure the global sort? The idea was to basically assume that if the clustering can be satisfied by a global sort, then do the global sort. For example, if the clustering is Set("b", "a") and the sort is Seq("a",

Re: [Spark R] Proposal: Exposing RBackend in RRunner

2018-03-30 Thread Felix Cheung
Auto reference counting should already be handled by SparkR already. Can you elaborate on which object and how that would be used? From: Jeremy Liu Sent: Thursday, March 29, 2018 8:23:58 AM To: Reynold Xin Cc: Felix Cheung;

Re: DataSourceV2 write input requirements

2018-03-30 Thread Patrick Woody
> > Right, you could use this to store a global ordering if there is only one > write (e.g., CTAS). I don’t think anything needs to change in that case, > you would still have a clustering and an ordering, but the ordering would > need to include all fields of the clustering. A way to pass in the

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread Yinan Li
Hi David, Regarding cpu limit, in Spark 2.3, we do have the following config properties to specify cpu limit for the driver and executors. See http://spark.apache.org/docs/latest/running-on-kubernetes.html. spark.kubernetes.driver.limit.cores spark.kubernetes.executor.limit.cores On Thu, Mar