[jira] [Updated] (SPARK-36473) Add non-unified memory metrics

2021-08-11 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-36473:
--
Description: 
Spark split memory to managed memory (storage + execution) and un-managed 
memory. And use `spark.memory.fraction` to adjust the fraction of this to 
memory.
But we have no metrics to know how much we used for unmanaged memory and we 
have no data evidence to optimize configuration of `spark.memory.fraction`. 
Maybe we should add it

> Add non-unified memory metrics
> --
>
> Key: SPARK-36473
> URL: https://issues.apache.org/jira/browse/SPARK-36473
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Spark split memory to managed memory (storage + execution) and un-managed 
> memory. And use `spark.memory.fraction` to adjust the fraction of this to 
> memory.
> But we have no metrics to know how much we used for unmanaged memory and we 
> have no data evidence to optimize configuration of `spark.memory.fraction`. 
> Maybe we should add it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36476) cloudpickle: ValueError: Cell is empty

2021-08-11 Thread Oliver Mannion (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397126#comment-17397126
 ] 

Oliver Mannion commented on SPARK-36476:


Looks like a similar issue raised here 
https://github.com/cloudpipe/cloudpickle/issues/393

> cloudpickle: ValueError: Cell is empty
> --
>
> Key: SPARK-36476
> URL: https://issues.apache.org/jira/browse/SPARK-36476
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
>Reporter: Oliver Mannion
>Priority: Major
>
> {code:java}
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
>  line 437, in dumps
> return cloudpickle.dumps(obj, pickle_protocol)
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 101, in dumps
> cp.dump(obj)
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 540, in dump
> return Pickler.dump(self, obj)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 437, in dump
> self.save(obj)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 504, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 789, in save_tuple
> save(element)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 504, in save
> f(self, obj) # Call unbound method with explicit self
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 722, in save_function
> *self._dynamic_function_reduce(obj), obj=obj
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 659, in _save_reduce_pickle5
> dictitems=dictitems, obj=obj
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 638, in save_reduce
> save(args)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 504, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 789, in save_tuple
> save(element)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 504, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 774, in save_tuple
> save(element)
>   File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
> 504, in save
> f(self, obj) # Call unbound method with explicit self
>   File 
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py",
>  line 1226, in save_cell
> f = obj.cell_contents
> ValueError: Cell is empty
> {code}
> Doesn't occur in Spark 3.0.0, so possibly introduced when cloudpickle was 
> upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094).
> Also doesn't occur in Spark 3.1.2 with python 3.8.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36476) cloudpickle: ValueError: Cell is empty

2021-08-11 Thread Oliver Mannion (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Mannion updated SPARK-36476:
---
Description: 
{code:java}
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
 line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 101, in dumps
cp.dump(obj)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 540, in dump
return Pickler.dump(self, obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
437, in dump
self.save(obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
638, in save_reduce
save(args)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
774, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py",
 line 1226, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
{code}
Doesn't occur in Spark 3.0.0, so possibly introduced when cloudpickle was 
upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094).

Also doesn't occur in Spark 3.1.2 with python 3.8.

 

  was:
{code:java}
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
 line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 101, in dumps
cp.dump(obj)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 540, in dump
return Pickler.dump(self, obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
437, in dump
self.save(obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
638, in save_reduce
save(args)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
774, in save_tuple
save(element)
  File 

[jira] [Updated] (SPARK-36476) cloudpickle: ValueError: Cell is empty

2021-08-11 Thread Oliver Mannion (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Mannion updated SPARK-36476:
---
Description: 
{code:java}
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
 line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 101, in dumps
cp.dump(obj)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 540, in dump
return Pickler.dump(self, obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
437, in dump
self.save(obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
638, in save_reduce
save(args)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
774, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py",
 line 1226, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
{code}
Doesn't occur in Spark 3.0.0, so possibly introduced which cloudpickle was 
upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094).

Also doesn't occur in Spark 3.1.2 with python 3.8.

 

  was:
{code:java}
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
 line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 101, in dumps
cp.dump(obj)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 540, in dump
return Pickler.dump(self, obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
437, in dump
self.save(obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
638, in save_reduce
save(args)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
774, in save_tuple
save(element)
  File 

[jira] [Created] (SPARK-36476) cloudpickle: ValueError: Cell is empty

2021-08-11 Thread Oliver Mannion (Jira)
Oliver Mannion created SPARK-36476:
--

 Summary: cloudpickle: ValueError: Cell is empty
 Key: SPARK-36476
 URL: https://issues.apache.org/jira/browse/SPARK-36476
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.1.2
Reporter: Oliver Mannion


{code:java}
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
 line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 101, in dumps
cp.dump(obj)
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 540, in dump
return Pickler.dump(self, obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
437, in dump
self.save(obj)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
 line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
638, in save_reduce
save(args)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
789, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
774, in save_tuple
save(element)
  File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line 
504, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py",
 line 1226, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
{code}
Doesn't occur in Spark 3.0.0, so possible introduced which cloudpickle was 
upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36475) Add doc about spark.shuffle.service.fetch.rdd.enabled

2021-08-11 Thread angerszhu (Jira)
angerszhu created SPARK-36475:
-

 Summary: Add doc about spark.shuffle.service.fetch.rdd.enabled
 Key: SPARK-36475
 URL: https://issues.apache.org/jira/browse/SPARK-36475
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.2.0
Reporter: angerszhu


Add doc about spark.shuffle.service.fetch.rdd.enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-08-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397105#comment-17397105
 ] 

Apache Spark commented on SPARK-35881:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33701

> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25888) Service requests for persist() blocks via external service after dynamic deallocation

2021-08-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397102#comment-17397102
 ] 

Apache Spark commented on SPARK-25888:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33700

> Service requests for persist() blocks via external service after dynamic 
> deallocation
> -
>
> Key: SPARK-25888
> URL: https://issues.apache.org/jira/browse/SPARK-25888
> Project: Spark
>  Issue Type: New Feature
>  Components: Block Manager, Shuffle, Spark Core, YARN
>Affects Versions: 2.3.2
>Reporter: Adam Kennedy
>Priority: Major
>  Labels: bulk-closed
>
> Large and highly multi-tenant Spark on YARN clusters with diverse job 
> execution often display terrible utilization rates (we have observed as low 
> as 3-7% CPU at max container allocation, but 50% CPU utilization on even a 
> well policed cluster is not uncommon).
> As a sizing example, consider a scenario with 1,000 nodes, 50,000 cores, 250 
> users and 50,000 runs of 1,000 distinct applications per week, with 
> predominantly Spark including a mixture of ETL, Ad Hoc tasks and PySpark 
> Notebook jobs (no streaming)
> Utilization problems appear to be due in large part to difficulties with 
> persist() blocks (DISK or DISK+MEMORY) preventing dynamic deallocation.
> In situations where an external shuffle service is present (which is typical 
> on clusters of this type) we already solve this for the shuffle block case by 
> offloading the IO handling of shuffle blocks to the external service, 
> allowing dynamic deallocation to proceed.
> Allowing Executors to transfer persist() blocks to some external "shuffle" 
> service in a similar manner would be an enormous win for Spark multi-tenancy 
> as it would limit deallocation blocking scenarios to only MEMORY-only cache() 
> scenarios.
> I'm not sure if I'm correct, but I seem to recall seeing in the original 
> external shuffle service commits that may have been considered at the time 
> but getting shuffle blocks moved to the external shuffle service was the 
> first priority.
> With support for external persist() DISK blocks in place, we could also then 
> handle deallocation of DISK+MEMORY, as the memory instance could first be 
> dropped, changing the block to DISK only, and then further transferred to the 
> shuffle service.
> We have tried to resolve the persist() issue via extensive user training, but 
> that has typically only allowed us to improve utilization of the worst 
> offenders (10% utilization) up to around 40-60% utilization, as the need for 
> persist() is often legitimate and occurs during the middle stages of a job.
> In a healthy multi-tenant scenario, a large job might spool up to say 10,000 
> cores, persist() data, release executors across a long tail down to 100 
> cores, and then spool back up to 10,000 cores for the following stage without 
> impact on the persist() data.
> In an ideal world, if an new executor started up on a node on which blocks 
> had been transferred to the shuffle service, the new executor might even be 
> able to "recapture" control of those blocks (if that would help with 
> performance in some way).
> And the behavior of gradually expanding up and down several times over the 
> course of a job would not just improve utilization, but would allow resources 
> to more easily be redistributed to other jobs which start on the cluster 
> during the long-tail periods, which would improve multi-tenancy and bring us 
> closer to optimal "envy free" YARN scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2021-08-11 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397100#comment-17397100
 ] 

Wenchen Fan commented on SPARK-34827:
-

Marking as a known issue LGTM. This is not a blocker to me as it only affects 
performance, not a correctness issue.

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36456.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33682
[https://github.com/apache/spark/pull/33682]

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.0
>
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> 

[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36456:
-

Assignee: Yang Jie

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> 

[jira] [Commented] (SPARK-36474) Mention pandas API on Spark in Spark overview pages

2021-08-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397082#comment-17397082
 ] 

Apache Spark commented on SPARK-36474:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33699

> Mention pandas API on Spark in Spark overview pages
> ---
>
> Key: SPARK-36474
> URL: https://issues.apache.org/jira/browse/SPARK-36474
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Trivial
>
> We can mention that https://spark.apache.org/docs/latest/index.html as an 
> example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36474) Mention pandas API on Spark in Spark overview pages

2021-08-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36474:


Assignee: Apache Spark

> Mention pandas API on Spark in Spark overview pages
> ---
>
> Key: SPARK-36474
> URL: https://issues.apache.org/jira/browse/SPARK-36474
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Trivial
>
> We can mention that https://spark.apache.org/docs/latest/index.html as an 
> example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36474) Mention pandas API on Spark in Spark overview pages

2021-08-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36474:


Assignee: (was: Apache Spark)

> Mention pandas API on Spark in Spark overview pages
> ---
>
> Key: SPARK-36474
> URL: https://issues.apache.org/jira/browse/SPARK-36474
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Trivial
>
> We can mention that https://spark.apache.org/docs/latest/index.html as an 
> example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2