[jira] [Updated] (SPARK-36473) Add non-unified memory metrics
[ https://issues.apache.org/jira/browse/SPARK-36473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-36473: -- Description: Spark split memory to managed memory (storage + execution) and un-managed memory. And use `spark.memory.fraction` to adjust the fraction of this to memory. But we have no metrics to know how much we used for unmanaged memory and we have no data evidence to optimize configuration of `spark.memory.fraction`. Maybe we should add it > Add non-unified memory metrics > -- > > Key: SPARK-36473 > URL: https://issues.apache.org/jira/browse/SPARK-36473 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Spark split memory to managed memory (storage + execution) and un-managed > memory. And use `spark.memory.fraction` to adjust the fraction of this to > memory. > But we have no metrics to know how much we used for unmanaged memory and we > have no data evidence to optimize configuration of `spark.memory.fraction`. > Maybe we should add it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36476) cloudpickle: ValueError: Cell is empty
[ https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397126#comment-17397126 ] Oliver Mannion commented on SPARK-36476: Looks like a similar issue raised here https://github.com/cloudpipe/cloudpickle/issues/393 > cloudpickle: ValueError: Cell is empty > -- > > Key: SPARK-36476 > URL: https://issues.apache.org/jira/browse/SPARK-36476 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.2 >Reporter: Oliver Mannion >Priority: Major > > {code:java} > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", > line 437, in dumps > return cloudpickle.dumps(obj, pickle_protocol) > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 101, in dumps > cp.dump(obj) > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 540, in dump > return Pickler.dump(self, obj) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 437, in dump > self.save(obj) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 504, in save > f(self, obj) # Call unbound method with explicit self > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 789, in save_tuple > save(element) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 504, in save > f(self, obj) # Call unbound method with explicit self > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 722, in save_function > *self._dynamic_function_reduce(obj), obj=obj > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 659, in _save_reduce_pickle5 > dictitems=dictitems, obj=obj > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 638, in save_reduce > save(args) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 504, in save > f(self, obj) # Call unbound method with explicit self > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 789, in save_tuple > save(element) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 504, in save > f(self, obj) # Call unbound method with explicit self > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 774, in save_tuple > save(element) > File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line > 504, in save > f(self, obj) # Call unbound method with explicit self > File > "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py", > line 1226, in save_cell > f = obj.cell_contents > ValueError: Cell is empty > {code} > Doesn't occur in Spark 3.0.0, so possibly introduced when cloudpickle was > upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094). > Also doesn't occur in Spark 3.1.2 with python 3.8. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36476) cloudpickle: ValueError: Cell is empty
[ https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oliver Mannion updated SPARK-36476: --- Description: {code:java} File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 101, in dumps cp.dump(obj) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump return Pickler.dump(self, obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 722, in save_function *self._dynamic_function_reduce(obj), obj=obj File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5 dictitems=dictitems, obj=obj File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 774, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell f = obj.cell_contents ValueError: Cell is empty {code} Doesn't occur in Spark 3.0.0, so possibly introduced when cloudpickle was upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094). Also doesn't occur in Spark 3.1.2 with python 3.8. was: {code:java} File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 101, in dumps cp.dump(obj) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump return Pickler.dump(self, obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 722, in save_function *self._dynamic_function_reduce(obj), obj=obj File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5 dictitems=dictitems, obj=obj File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 774, in save_tuple save(element) File
[jira] [Updated] (SPARK-36476) cloudpickle: ValueError: Cell is empty
[ https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oliver Mannion updated SPARK-36476: --- Description: {code:java} File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 101, in dumps cp.dump(obj) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump return Pickler.dump(self, obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 722, in save_function *self._dynamic_function_reduce(obj), obj=obj File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5 dictitems=dictitems, obj=obj File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 774, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell f = obj.cell_contents ValueError: Cell is empty {code} Doesn't occur in Spark 3.0.0, so possibly introduced which cloudpickle was upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094). Also doesn't occur in Spark 3.1.2 with python 3.8. was: {code:java} File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 101, in dumps cp.dump(obj) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump return Pickler.dump(self, obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 722, in save_function *self._dynamic_function_reduce(obj), obj=obj File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5 dictitems=dictitems, obj=obj File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 774, in save_tuple save(element) File
[jira] [Created] (SPARK-36476) cloudpickle: ValueError: Cell is empty
Oliver Mannion created SPARK-36476: -- Summary: cloudpickle: ValueError: Cell is empty Key: SPARK-36476 URL: https://issues.apache.org/jira/browse/SPARK-36476 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.1.2 Reporter: Oliver Mannion {code:java} File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 101, in dumps cp.dump(obj) File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump return Pickler.dump(self, obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 722, in save_function *self._dynamic_function_reduce(obj), obj=obj File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5 dictitems=dictitems, obj=obj File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 789, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 774, in save_tuple save(element) File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell f = obj.cell_contents ValueError: Cell is empty {code} Doesn't occur in Spark 3.0.0, so possible introduced which cloudpickle was upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36475) Add doc about spark.shuffle.service.fetch.rdd.enabled
angerszhu created SPARK-36475: - Summary: Add doc about spark.shuffle.service.fetch.rdd.enabled Key: SPARK-36475 URL: https://issues.apache.org/jira/browse/SPARK-36475 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.2.0 Reporter: angerszhu Add doc about spark.shuffle.service.fetch.rdd.enabled -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397105#comment-17397105 ] Apache Spark commented on SPARK-35881: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33701 > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25888) Service requests for persist() blocks via external service after dynamic deallocation
[ https://issues.apache.org/jira/browse/SPARK-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397102#comment-17397102 ] Apache Spark commented on SPARK-25888: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33700 > Service requests for persist() blocks via external service after dynamic > deallocation > - > > Key: SPARK-25888 > URL: https://issues.apache.org/jira/browse/SPARK-25888 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Shuffle, Spark Core, YARN >Affects Versions: 2.3.2 >Reporter: Adam Kennedy >Priority: Major > Labels: bulk-closed > > Large and highly multi-tenant Spark on YARN clusters with diverse job > execution often display terrible utilization rates (we have observed as low > as 3-7% CPU at max container allocation, but 50% CPU utilization on even a > well policed cluster is not uncommon). > As a sizing example, consider a scenario with 1,000 nodes, 50,000 cores, 250 > users and 50,000 runs of 1,000 distinct applications per week, with > predominantly Spark including a mixture of ETL, Ad Hoc tasks and PySpark > Notebook jobs (no streaming) > Utilization problems appear to be due in large part to difficulties with > persist() blocks (DISK or DISK+MEMORY) preventing dynamic deallocation. > In situations where an external shuffle service is present (which is typical > on clusters of this type) we already solve this for the shuffle block case by > offloading the IO handling of shuffle blocks to the external service, > allowing dynamic deallocation to proceed. > Allowing Executors to transfer persist() blocks to some external "shuffle" > service in a similar manner would be an enormous win for Spark multi-tenancy > as it would limit deallocation blocking scenarios to only MEMORY-only cache() > scenarios. > I'm not sure if I'm correct, but I seem to recall seeing in the original > external shuffle service commits that may have been considered at the time > but getting shuffle blocks moved to the external shuffle service was the > first priority. > With support for external persist() DISK blocks in place, we could also then > handle deallocation of DISK+MEMORY, as the memory instance could first be > dropped, changing the block to DISK only, and then further transferred to the > shuffle service. > We have tried to resolve the persist() issue via extensive user training, but > that has typically only allowed us to improve utilization of the worst > offenders (10% utilization) up to around 40-60% utilization, as the need for > persist() is often legitimate and occurs during the middle stages of a job. > In a healthy multi-tenant scenario, a large job might spool up to say 10,000 > cores, persist() data, release executors across a long tail down to 100 > cores, and then spool back up to 10,000 cores for the following stage without > impact on the persist() data. > In an ideal world, if an new executor started up on a node on which blocks > had been transferred to the shuffle service, the new executor might even be > able to "recapture" control of those blocks (if that would help with > performance in some way). > And the behavior of gradually expanding up and down several times over the > course of a job would not just improve utilization, but would allow resources > to more easily be redistributed to other jobs which start on the cluster > during the long-tail periods, which would improve multi-tenancy and bring us > closer to optimal "envy free" YARN scheduling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397100#comment-17397100 ] Wenchen Fan commented on SPARK-34827: - Marking as a known issue LGTM. This is not a blocker to me as it only affects performance, not a correctness issue. > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36456. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33682 [https://github.com/apache/spark/pull/33682] > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method >
[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-36456: - Assignee: Yang Jie > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] >
[jira] [Commented] (SPARK-36474) Mention pandas API on Spark in Spark overview pages
[ https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397082#comment-17397082 ] Apache Spark commented on SPARK-36474: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33699 > Mention pandas API on Spark in Spark overview pages > --- > > Key: SPARK-36474 > URL: https://issues.apache.org/jira/browse/SPARK-36474 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Trivial > > We can mention that https://spark.apache.org/docs/latest/index.html as an > example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36474) Mention pandas API on Spark in Spark overview pages
[ https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36474: Assignee: Apache Spark > Mention pandas API on Spark in Spark overview pages > --- > > Key: SPARK-36474 > URL: https://issues.apache.org/jira/browse/SPARK-36474 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Trivial > > We can mention that https://spark.apache.org/docs/latest/index.html as an > example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36474) Mention pandas API on Spark in Spark overview pages
[ https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36474: Assignee: (was: Apache Spark) > Mention pandas API on Spark in Spark overview pages > --- > > Key: SPARK-36474 > URL: https://issues.apache.org/jira/browse/SPARK-36474 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Trivial > > We can mention that https://spark.apache.org/docs/latest/index.html as an > example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org