date:20220818

[jira] [Assigned] (SPARK-40144) Standalone log-view can't load new

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40144:


Assignee: Apache Spark

> Standalone log-view can't load new 
> ---
>
> Key: SPARK-40144
> URL: https://issues.apache.org/jira/browse/SPARK-40144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Obobj
>Assignee: Apache Spark
>Priority: Minor
>
> log-view.js load new needs to call getBaseURI() of the utils.js file, but 
> does not reference utils.js



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40144) Standalone log-view can't load new

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40144:


Assignee: (was: Apache Spark)

> Standalone log-view can't load new 
> ---
>
> Key: SPARK-40144
> URL: https://issues.apache.org/jira/browse/SPARK-40144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Obobj
>Priority: Minor
>
> log-view.js load new needs to call getBaseURI() of the utils.js file, but 
> does not reference utils.js



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40144) Standalone log-view can't load new

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581664#comment-17581664
 ] 

Apache Spark commented on SPARK-40144:
--

User 'obobj' has created a pull request for this issue:
https://github.com/apache/spark/pull/37577

> Standalone log-view can't load new 
> ---
>
> Key: SPARK-40144
> URL: https://issues.apache.org/jira/browse/SPARK-40144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Obobj
>Priority: Minor
>
> log-view.js load new needs to call getBaseURI() of the utils.js file, but 
> does not reference utils.js



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40144) Standalone log-view can't load new

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581665#comment-17581665
 ] 

Apache Spark commented on SPARK-40144:
--

User 'obobj' has created a pull request for this issue:
https://github.com/apache/spark/pull/37577

> Standalone log-view can't load new 
> ---
>
> Key: SPARK-40144
> URL: https://issues.apache.org/jira/browse/SPARK-40144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Obobj
>Priority: Minor
>
> log-view.js load new needs to call getBaseURI() of the utils.js file, but 
> does not reference utils.js



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30115) Improve limit only query on datasource table

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30115:


Assignee: Apache Spark

> Improve limit only query on datasource table
> 
>
> Key: SPARK-30115
> URL: https://issues.apache.org/jira/browse/SPARK-30115
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30115) Improve limit only query on datasource table

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30115:


Assignee: (was: Apache Spark)

> Improve limit only query on datasource table
> 
>
> Key: SPARK-30115
> URL: https://issues.apache.org/jira/browse/SPARK-30115
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40144) Standalone log-view can't load new

2022-08-18 Thread Obobj (Jira)

Obobj created SPARK-40144:
-

 Summary: Standalone log-view can't load new 
 Key: SPARK-40144
 URL: https://issues.apache.org/jira/browse/SPARK-40144
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 3.0.0
Reporter: Obobj


log-view.js load new needs to call getBaseURI() of the utils.js file, but does 
not reference utils.js



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40005) Self-contained examples with parameter descriptions in PySpark documentation

2022-08-18 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581658#comment-17581658
 ] 

Hyukjin Kwon commented on SPARK-40005:
--

I removed ML since ML has its own dedicated docs 
https://spark.apache.org/docs/latest/ml-guide.html

> Self-contained examples with parameter descriptions in PySpark documentation
> 
>
> Key: SPARK-40005
> URL: https://issues.apache.org/jira/browse/SPARK-40005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Critical
>
> This JIRA aims to improve PySpark documentation in:
> - {{pyspark}}
> - {{pyspark.sql}}
> - {{pyspark.sql.streaming}}
> We should:
> - Make the examples self-contained, e.g., 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
> - Document {{Parameters}} 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>  There are many API that misses parameters in PySpark, e.g., 
> [DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]
> If the size of file is large, e.g., dataframe.py, we should split that down 
> into each subtask, and improve documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40005) Self-contained examples with parameter descriptions in PySpark documentation

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40005:
-
Description: 
This JIRA aims to improve PySpark documentation in:
- {{pyspark}}
- {{pyspark.sql}}
- {{pyspark.sql.streaming}}

We should:
- Make the examples self-contained, e.g., 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document {{Parameters}} 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
 There are many API that misses parameters in PySpark, e.g., 
[DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]

If the size of file is large, e.g., dataframe.py, we should split that down 
into each subtask, and improve documentation.

  was:
This JIRA aims to improve PySpark documentation in:
- {{pyspark}}
- {{pyspark.ml}}
- {{pyspark.sql}}
- {{pyspark.sql.streaming}}

We should:
- Make the examples self-contained, e.g., 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document {{Parameters}} 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
 There are many API that misses parameters in PySpark, e.g., 
[DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]

If the size of file is large, e.g., dataframe.py, we should split that down 
into each subtask, and improve documentation.


> Self-contained examples with parameter descriptions in PySpark documentation
> 
>
> Key: SPARK-40005
> URL: https://issues.apache.org/jira/browse/SPARK-40005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Critical
>
> This JIRA aims to improve PySpark documentation in:
> - {{pyspark}}
> - {{pyspark.sql}}
> - {{pyspark.sql.streaming}}
> We should:
> - Make the examples self-contained, e.g., 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
> - Document {{Parameters}} 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>  There are many API that misses parameters in PySpark, e.g., 
> [DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]
> If the size of file is large, e.g., dataframe.py, we should split that down 
> into each subtask, and improve documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35542) Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

2022-08-18 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-35542.

Fix Version/s: 3.3.1
   3.1.4
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37568
[https://github.com/apache/spark/pull/37568]

> Bucketizer created for multiple columns with parameters splitsArray,  
> inputCols and outputCols can not be loaded after saving it.
> -
>
> Key: SPARK-35542
> URL: https://issues.apache.org/jira/browse/SPARK-35542
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
> Environment: {color:#172b4d}DataBricks Spark 3.1.1{color}
>Reporter: Srikanth Pusarla
>Assignee: Weichen Xu
>Priority: Minor
> Fix For: 3.3.1, 3.1.4, 3.2.3, 3.4.0
>
> Attachments: Code-error.PNG, traceback.png
>
>
> Bucketizer created for multiple columns with parameters *splitsArray*,  
> *inputCols* and *outputCols* can not be loaded after saving it.
> The problem is not seen for Bucketizer created for single column.
> *Code to reproduce* 
> ###
> from pyspark.ml.feature import Bucketizer
> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"])
> bucketizer = Bucketizer(*splitsArray*= [[-float("inf"), 0.5, 1.4, 
> float("inf")], [-float("inf"), 0.1, 1.2, float("inf")]], 
> *inputCols*=["values", "values"], *outputCols*=["b1", "b2"])
> bucketed = bucketizer.transform(df).collect()
> dfb = bucketizer.transform(df)
> print(dfb.show())
> bucketizerPath = "dbfs:/mnt/S3-Bucket/" + "Bucketizer"
> bucketizer.write().overwrite().save(bucketizerPath)
> loadedBucketizer = {color:#FF}Bucketizer.load(bucketizerPath)     
> Failing here{color}
> loadedBucketizer.getSplits() == bucketizer.getSplits()
> 
> The error message is 
> {color:#FF}*TypeError: array() argument 1 must be a unicode character, 
> not bytes*{color}
>  
> *BackTrace:*
>  
> --
> TypeError Traceback (most recent call last)  in  15 
> 16 bucketizer.write().overwrite().save(bucketizerPath)
> ---> 17 loadedBucketizer = Bucketizer.load(bucketizerPath)
> 18 loadedBucketizer.getSplits() == bucketizer.getSplits()
>  
> /databricks/spark/python/pyspark/ml/util.py in load(cls, path)
> 376 def load(cls, path):
> 377 """Reads an ML instance from the input path, a shortcut of 
> `read().load(path)`."""
> --> 378 return cls.read().load(path)
> 379
> 380
>  
> /databricks/spark/python/pyspark/ml/util.py in load(self, path)
> 330 raise NotImplementedError("This Java ML type cannot be loaded into Python 
> currently: %r"
> 331 % self._clazz)
> --> 332 return self._clazz._from_java(java_obj)
> 333
> 334
>  
> def session(self, sparkSession): 
> /databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
> 258
> 259 py_stage._resetUid(java_stage.uid())
> --> 260 py_stage._transfer_params_from_java()
> 261 elif hasattr(py_type, "_from_java"):
> 262 py_stage = py_type._from_java(java_stage)
>  
> /databricks/spark/python/pyspark/ml/wrapper.py in 
> _transfer_params_from_java(self)
> 186 # SPARK-14931: Only check set params back to avoid default params 
> mismatch.
> 187 if self._java_obj.isSet(java_param): --> 
> 188 value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> 189 self._set(**{param.name: value})
> 190 # SPARK-10931: Temporary fix for params that have a default in Java
>  
> /databricks/spark/python/pyspark/ml/common.py in _java2py(sc, r, encoding)
> 107
> 108 if isinstance(r, (bytearray, bytes)):
> --> 109 r = PickleSerializer().loads(bytes(r), encoding=encoding)
> 110  return r
> 111
>  
> /databricks/spark/python/pyspark/serializers.py in loads(self, obj, encoding)
> 467
> 468 def loads(self, obj, encoding="bytes"):
> --> 469 return pickle.loads(obj, encoding=encoding)
> 470
> 471
>  
> TypeError: array() argument 1 must be a unicode character, not bytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40120) Make pyspark.sql.readwriter examples self-contained

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40120.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37570
[https://github.com/apache/spark/pull/37570]

> Make pyspark.sql.readwriter examples self-contained
> ---
>
> Key: SPARK-40120
> URL: https://issues.apache.org/jira/browse/SPARK-40120
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40120) Make pyspark.sql.readwriter examples self-contained

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40120:


Assignee: Hyukjin Kwon

> Make pyspark.sql.readwriter examples self-contained
> ---
>
> Key: SPARK-40120
> URL: https://issues.apache.org/jira/browse/SPARK-40120
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40143) ANSI mode: allow explicitly casting fraction strings as Integral types

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581635#comment-17581635
 ] 

Apache Spark commented on SPARK-40143:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37576

>  ANSI mode: allow explicitly casting fraction strings as Integral types
> ---
>
> Key: SPARK-40143
> URL: https://issues.apache.org/jira/browse/SPARK-40143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> It's part of the ANSI SQL standard, and more consistent with the non-ansi 
> casting. We can have different behavior for implicit casting to avoid `'1.2' 
> = 1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40143) ANSI mode: allow explicitly casting fraction strings as Integral types

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581633#comment-17581633
 ] 

Apache Spark commented on SPARK-40143:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37576

>  ANSI mode: allow explicitly casting fraction strings as Integral types
> ---
>
> Key: SPARK-40143
> URL: https://issues.apache.org/jira/browse/SPARK-40143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> It's part of the ANSI SQL standard, and more consistent with the non-ansi 
> casting. We can have different behavior for implicit casting to avoid `'1.2' 
> = 1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40143) ANSI mode: allow explicitly casting fraction strings as Integral types

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40143:


Assignee: Gengliang Wang  (was: Apache Spark)

>  ANSI mode: allow explicitly casting fraction strings as Integral types
> ---
>
> Key: SPARK-40143
> URL: https://issues.apache.org/jira/browse/SPARK-40143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> It's part of the ANSI SQL standard, and more consistent with the non-ansi 
> casting. We can have different behavior for implicit casting to avoid `'1.2' 
> = 1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40143) ANSI mode: allow explicitly casting fraction strings as Integral types

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40143:


Assignee: Apache Spark  (was: Gengliang Wang)

>  ANSI mode: allow explicitly casting fraction strings as Integral types
> ---
>
> Key: SPARK-40143
> URL: https://issues.apache.org/jira/browse/SPARK-40143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> It's part of the ANSI SQL standard, and more consistent with the non-ansi 
> casting. We can have different behavior for implicit casting to avoid `'1.2' 
> = 1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37130) why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar)

2022-08-18 Thread Victor Tso (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581631#comment-17581631
 ] 

Victor Tso commented on SPARK-37130:


I have the same question. We want to have Hadoop provided by the environment, 
but at minimum spark-hive should be there, as the environment couldn't possibly 
provide that.

> why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and 
> spark-hive-thriftserver_X.jar)
> 
>
> Key: SPARK-37130
> URL: https://issues.apache.org/jira/browse/SPARK-37130
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Patrice DUROUX
>Priority: Minor
>
> Hi,
> As my deployment is having its own Hadoop(+Hive) installed, I have tried to 
> install Spark using  its bundle without Hadoop. I suspect that some jars are 
> missing that are present in the corresponding spark-X.X.X-bin-hadoop3.2.tgz.
> After comparing their contents both spark-hive_2.12-X.X.X.jar and 
> spark-hive-thriftserver_2.12-X.X.X.jar are not in the 
> spark-X.X.X-bin-without---hadoop.tgz. And I don't know if some others should 
> also be there.
> Thanks,
> Patrice
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40143) ANSI mode: allow explicitly casting fraction strings as Integral types

2022-08-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-40143:
---
Summary:  ANSI mode: allow explicitly casting fraction strings as Integral 
types  (was: ANSI mode: explicitly casting String as Integral types should 
allow fraction strings)

>  ANSI mode: allow explicitly casting fraction strings as Integral types
> ---
>
> Key: SPARK-40143
> URL: https://issues.apache.org/jira/browse/SPARK-40143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> It's part of the ANSI SQL standard, and more consistent with the non-ansi 
> casting. We can have different behavior for implicit casting to avoid `'1.2' 
> = 1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40143) ANSI mode: explicitly casting String as Integral types should allow fraction strings

2022-08-18 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-40143:
--

 Summary: ANSI mode: explicitly casting String as Integral types 
should allow fraction strings
 Key: SPARK-40143
 URL: https://issues.apache.org/jira/browse/SPARK-40143
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


It's part of the ANSI SQL standard, and more consistent with the non-ansi 
casting. We can have different behavior for implicit casting to avoid `'1.2' = 
1` returns true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581628#comment-17581628
 ] 

Apache Spark commented on SPARK-40142:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37575

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40142:


Assignee: (was: Apache Spark)

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40142:


Assignee: Apache Spark

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581627#comment-17581627
 ] 

Apache Spark commented on SPARK-40142:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37575

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-18 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-40142:


 Summary: Make pyspark.sql.functions examples self-contained
 Key: SPARK-40142
 URL: https://issues.apache.org/jira/browse/SPARK-40142
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39271.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37567
[https://github.com/apache/spark/pull/37567]

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query

2022-08-18 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581606#comment-17581606
 ] 

Hyukjin Kwon commented on SPARK-40081:
--

[~dcoliversun] just asking. are you working on this?

> Add Document Parameters for pyspark.sql.streaming.query
> ---
>
> Key: SPARK-40081
> URL: https://issues.apache.org/jira/browse/SPARK-40081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39271:


Assignee: Yikun Jiang

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Qiyuan Gong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581582#comment-17581582
 ] 

Qiyuan Gong commented on SPARK-37751:
-

Hi [~benoit_roy] . Some hot fix for this issue:
 # Change back to Java 8 if possible.
 # Use Kernel 5.4 or higher. We found this reduce the possibility of this 
error. 

 

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40140) REST API for SQL level information does not show information on running queries

2022-08-18 Thread zzzzming95 (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581580#comment-17581580
 ] 

ming95 commented on SPARK-40140:


I'm interested in this issue and I can fix it,  if no one else is working on 
it. :)

> REST API for SQL level information does not show information on running 
> queries
> ---
>
> Key: SPARK-40140
> URL: https://issues.apache.org/jira/browse/SPARK-40140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi All,
> We noticed that the SQL information REST API implemented in 
> https://issues.apache.org/jira/browse/SPARK-27142 does not return back SQL 
> queries which are currently running. We can only see queries which are 
> completed/failed.
> As far as I can see, this should be supported since one of the fields in the 
> returned JSON is "runningJobIds". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40141) Task listener overloads no longer needed with JDK 8+

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40141:


Assignee: (was: Apache Spark)

> Task listener overloads no longer needed with JDK 8+
> 
>
> Key: SPARK-40141
> URL: https://issues.apache.org/jira/browse/SPARK-40141
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ryan Johnson
>Priority: Major
>
> TaskContext defines methods for registering completion and failure listeners, 
> and the respective listener types qualify as functional interfaces in JDK 8+. 
> This leads to awkward ambiguous overload errors with the overload of each 
> function, that takes a function directly instead of a listener. Now that JDK 
> 8 is the minimum allowed, we can remove the unnecessary overloads, which not 
> only simplifies the code, but also removes a source of frustration since it 
> can be nearly impossible to predict when an ambiguous overload might be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40141) Task listener overloads no longer needed with JDK 8+

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581569#comment-17581569
 ] 

Apache Spark commented on SPARK-40141:
--

User 'ryan-johnson-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/37573

> Task listener overloads no longer needed with JDK 8+
> 
>
> Key: SPARK-40141
> URL: https://issues.apache.org/jira/browse/SPARK-40141
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ryan Johnson
>Priority: Major
>
> TaskContext defines methods for registering completion and failure listeners, 
> and the respective listener types qualify as functional interfaces in JDK 8+. 
> This leads to awkward ambiguous overload errors with the overload of each 
> function, that takes a function directly instead of a listener. Now that JDK 
> 8 is the minimum allowed, we can remove the unnecessary overloads, which not 
> only simplifies the code, but also removes a source of frustration since it 
> can be nearly impossible to predict when an ambiguous overload might be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40141) Task listener overloads no longer needed with JDK 8+

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40141:


Assignee: Apache Spark

> Task listener overloads no longer needed with JDK 8+
> 
>
> Key: SPARK-40141
> URL: https://issues.apache.org/jira/browse/SPARK-40141
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ryan Johnson
>Assignee: Apache Spark
>Priority: Major
>
> TaskContext defines methods for registering completion and failure listeners, 
> and the respective listener types qualify as functional interfaces in JDK 8+. 
> This leads to awkward ambiguous overload errors with the overload of each 
> function, that takes a function directly instead of a listener. Now that JDK 
> 8 is the minimum allowed, we can remove the unnecessary overloads, which not 
> only simplifies the code, but also removes a source of frustration since it 
> can be nearly impossible to predict when an ambiguous overload might be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36462) Allow Spark on Kube to operate without polling or watchers

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36462:
--
Affects Version/s: 3.4.0
   (was: 3.2.0)
   (was: 3.3.0)

> Allow Spark on Kube to operate without polling or watchers
> --
>
> Key: SPARK-36462
> URL: https://issues.apache.org/jira/browse/SPARK-36462
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
> Fix For: 3.4.0
>
>
> Add an option to Spark on Kube to not track the individual executor pods and 
> just assume K8s is doing what it's asked. This would be a developer feature 
> intended for minimizing load on etcd & driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36462) Allow Spark on Kube to operate without polling or watchers

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36462:
-

Assignee: Holden Karau

> Allow Spark on Kube to operate without polling or watchers
> --
>
> Key: SPARK-36462
> URL: https://issues.apache.org/jira/browse/SPARK-36462
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
>
> Add an option to Spark on Kube to not track the individual executor pods and 
> just assume K8s is doing what it's asked. This would be a developer feature 
> intended for minimizing load on etcd & driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36462) Allow Spark on Kube to operate without polling or watchers

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36462.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36433
[https://github.com/apache/spark/pull/36433]

> Allow Spark on Kube to operate without polling or watchers
> --
>
> Key: SPARK-36462
> URL: https://issues.apache.org/jira/browse/SPARK-36462
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
> Fix For: 3.4.0
>
>
> Add an option to Spark on Kube to not track the individual executor pods and 
> just assume K8s is doing what it's asked. This would be a developer feature 
> intended for minimizing load on etcd & driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40141) Task listener overloads no longer needed with JDK 8+

2022-08-18 Thread Ryan Johnson (Jira)

Ryan Johnson created SPARK-40141:


 Summary: Task listener overloads no longer needed with JDK 8+
 Key: SPARK-40141
 URL: https://issues.apache.org/jira/browse/SPARK-40141
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Ryan Johnson


TaskContext defines methods for registering completion and failure listeners, 
and the respective listener types qualify as functional interfaces in JDK 8+. 
This leads to awkward ambiguous overload errors with the overload of each 
function, that takes a function directly instead of a listener. Now that JDK 8 
is the minimum allowed, we can remove the unnecessary overloads, which not only 
simplifies the code, but also removes a source of frustration since it can be 
nearly impossible to predict when an ambiguous overload might be triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40106) Task failure handlers should always run if the task failed

2022-08-18 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-40106:
--

Assignee: Ryan Johnson

> Task failure handlers should always run if the task failed
> --
>
> Key: SPARK-40106
> URL: https://issues.apache.org/jira/browse/SPARK-40106
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ryan Johnson
>Assignee: Ryan Johnson
>Priority: Major
> Fix For: 3.4.0
>
>
> Today, if a task body succeeds, but a task completion listener fails, task 
> failure listeners are not called -- even tho the task has indeed failed at 
> that point.
> If a completion listener fails, and failure listeners were not previously 
> invoked, we should invoke them before running the remaining completion 
> listeners.
> Such a change would increase the utility of task listeners, especially ones 
> intended to assist with task cleanup. 
> To give one arbitrary example, code like this appears at several places in 
> the code (taken from {{executeTask}} method of FileFormatWriter.scala):
> {code:java}
>     try {
>       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
>         // Execute the task to write rows out and commit the task.
>         dataWriter.writeWithIterator(iterator)
>         dataWriter.commit()
>       })(catchBlock = {
>         // If there is an error, abort the task
>     dataWriter.abort()
>         logError(s"Job $jobId aborted.")
>       }, finallyBlock = {
>         dataWriter.close()
>       })
>     } catch {
>       case e: FetchFailedException =>
>         throw e
>       case f: FileAlreadyExistsException if 
> SQLConf.get.fastFailFileFormatOutput =>
>         // If any output file to write already exists, it does not make sense 
> to re-run this task.
>         // We throw the exception and let Executor throw ExceptionFailure to 
> abort the job.
>         throw new TaskOutputFileAlreadyExistException(f)
>       case t: Throwable =>
>         throw QueryExecutionErrors.taskFailedWhileWritingRowsError(t)
>     }{code}
> If failure listeners were reliably called, the above idiom could potentially 
> be factored out as two failure listeners plus a completion listener, and 
> reused rather than duplicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40106) Task failure handlers should always run if the task failed

2022-08-18 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-40106.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37531
[https://github.com/apache/spark/pull/37531]

> Task failure handlers should always run if the task failed
> --
>
> Key: SPARK-40106
> URL: https://issues.apache.org/jira/browse/SPARK-40106
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ryan Johnson
>Priority: Major
> Fix For: 3.4.0
>
>
> Today, if a task body succeeds, but a task completion listener fails, task 
> failure listeners are not called -- even tho the task has indeed failed at 
> that point.
> If a completion listener fails, and failure listeners were not previously 
> invoked, we should invoke them before running the remaining completion 
> listeners.
> Such a change would increase the utility of task listeners, especially ones 
> intended to assist with task cleanup. 
> To give one arbitrary example, code like this appears at several places in 
> the code (taken from {{executeTask}} method of FileFormatWriter.scala):
> {code:java}
>     try {
>       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
>         // Execute the task to write rows out and commit the task.
>         dataWriter.writeWithIterator(iterator)
>         dataWriter.commit()
>       })(catchBlock = {
>         // If there is an error, abort the task
>     dataWriter.abort()
>         logError(s"Job $jobId aborted.")
>       }, finallyBlock = {
>         dataWriter.close()
>       })
>     } catch {
>       case e: FetchFailedException =>
>         throw e
>       case f: FileAlreadyExistsException if 
> SQLConf.get.fastFailFileFormatOutput =>
>         // If any output file to write already exists, it does not make sense 
> to re-run this task.
>         // We throw the exception and let Executor throw ExceptionFailure to 
> abort the job.
>         throw new TaskOutputFileAlreadyExistException(f)
>       case t: Throwable =>
>         throw QueryExecutionErrors.taskFailedWhileWritingRowsError(t)
>     }{code}
> If failure listeners were reliably called, the above idiom could potentially 
> be factored out as two failure listeners plus a completion listener, and 
> reused rather than duplicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-4:


Assignee: (was: Apache Spark)

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-4:


Assignee: Apache Spark

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-4:
---
  Assignee: (was: Daniel)

This is reverted via 
https://github.com/apache/spark/commit/50c163578cfef79002fbdbc54b3b8fc10cfbcf65

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-4:
--
Fix Version/s: (was: 3.4.0)

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Benoit Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581506#comment-17581506
 ] 

Benoit Roy edited comment on SPARK-37751 at 8/18/22 8:10 PM:
-

Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0 - standalone), so this also affects Spark 3.3.0 version.

Any suggestions how we can resolve this? - aside from 
_spark.network.crypto.enabled_ ?


was (Author: JIRAUSER293512):
Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0 - standalone), so this appears also appears to affect 
Spark 3.3.0 version.

Any suggestions how we can resolve this? - aside from 
_spark.network.crypto.enabled_ ?

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail:

[jira] [Updated] (SPARK-37544) sequence over dates with month interval is producing incorrect results

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37544:
--
Fix Version/s: 3.1.4

> sequence over dates with month interval is producing incorrect results
> --
>
> Key: SPARK-37544
> URL: https://issues.apache.org/jira/browse/SPARK-37544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.0
> Environment: Ubuntu 20, OSX 11.6
> OpenJDK 11, Spark 3.2
>Reporter: Vsevolod Ostapenko
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Sequence function with dates and step interval in months producing unexpected 
> results.
> Here is a sample using Spark 3.2 (though the behavior is the same in 3.1.1 
> and presumably earlier):
> {{scala> spark.sql("select sequence(date '2021-01-01', date '2022-01-01', 
> interval '3' month) x, date '2021-01-01' + interval '3' month y").collect()}}
> {{res1: Array[org.apache.spark.sql.Row] = Array([WrappedArray(2021-01-01, 
> {color:#FF}*2021-03-31, 2021-06-30, 2021-09-30,* 
> {color}{color:#172b4d}2022-01-01{color}),2021-04-01])}}
> Expected result of adding 3 months to the 2021-01-01 is 2021-04-01, while 
> sequence returns 2021-03-31.
> At the same time sequence over timestamps works as expected:
> {{scala> spark.sql("select sequence(timestamp '2021-01-01 00:00', timestamp 
> '2022-01-01 00:00', interval '3' month) x").collect()}}
> {{res2: Array[org.apache.spark.sql.Row] = Array([WrappedArray(2021-01-01 
> 00:00:00.0, *2021-04-01* 00:00:00.0, *2021-07-01* 00:00:00.0, *2021-10-01* 
> 00:00:00.0, 2022-01-01 00:00:00.0)])}}
>  
> A similar issue was reported in the past - [SPARK-31654] sequence producing 
> inconsistent intervals for month step - ASF JIRA (apache.org)
> It's marked resolved, but the problem is either resurfaced or was never 
> actually fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40134:
--
Affects Version/s: 3.4.0

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40134:
--
Affects Version/s: 3.3.0

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40134:
--
Affects Version/s: (was: 3.4.0)

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40134:
--
Fix Version/s: 3.4.0

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40134:
--
Fix Version/s: (was: 3.4.0)

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40134.
---
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37563
[https://github.com/apache/spark/pull/37563]

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40134) Update ORC to 1.7.6

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40134:
-

Assignee: William Hyun

> Update ORC to 1.7.6
> ---
>
> Key: SPARK-40134
> URL: https://issues.apache.org/jira/browse/SPARK-40134
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Benoit Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581506#comment-17581506
 ] 

Benoit Roy edited comment on SPARK-37751 at 8/18/22 7:54 PM:
-

Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0 - standalone), so this appears also appears to affect 
Spark 3.3.0 version.

Any suggestions how we can resolve this? - aside from 
_spark.network.crypto.enabled_ ?


was (Author: JIRAUSER293512):
Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
_spark.network.crypto.enabled_ ?

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail:

[jira] [Comment Edited] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Benoit Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581506#comment-17581506
 ] 

Benoit Roy edited comment on SPARK-37751 at 8/18/22 7:48 PM:
-

Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
_spark.network.crypto.enabled_ ?


was (Author: JIRAUSER293512):
Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
spark.network.crypto.{_}enabled{_} ?

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail:

[jira] [Comment Edited] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Benoit Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581506#comment-17581506
 ] 

Benoit Roy edited comment on SPARK-37751 at 8/18/22 7:48 PM:
-

Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
spark.network.crypto.{_}enabled{_} ?


was (Author: JIRAUSER293512):
Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
`spark.network.crypto.enabled` ?

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail:

[jira] [Commented] (SPARK-37751) Apache Commons Crypto doesn't support Java 11

2022-08-18 Thread Benoit Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581506#comment-17581506
 ] 

Benoit Roy commented on SPARK-37751:


Hello, we have also encountered this issue after upgrading to Java11 (we also 
migrated to Spark 3.3.0), so this appears also appears to affect Spark 3.3.0 
version.

Any suggestions how we can resolve this? - aside from 
`spark.network.crypto.enabled` ?

> Apache Commons Crypto doesn't support Java 11
> -
>
> Key: SPARK-37751
> URL: https://issues.apache.org/jira/browse/SPARK-37751
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.1.2, 3.2.0
> Environment: Spark 3.2.0 on kubernetes
>Reporter: Shipeng Feng
>Priority: Major
>
> For kubernetes, we are using Java 11 in docker, 
> [https://github.com/apache/spark/blob/v3.2.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile:]
> {code:java}
> ARG java_image_tag=11-jre-slim
> {code}
> We have a simple app:
> {code:scala}
> object SimpleApp {
>   def main(args: Array[String]) {
> val session = SparkSession.builder.getOrCreate
>   
> // the size of demo.csv is 5GB
> val rdd = session.read.option("header", "true").option("inferSchema", 
> "true").csv("/data/demo.csv").rdd
> val lines = rdd.repartition(200)
> val count = lines.count()
>   }
> }
> {code}
>  
> Enable AES-based encryption for RPC connection by the following config:
> {code:java}
> --conf spark.authenticate=true
> --conf spark.network.crypto.enabled=true
> {code}
> This would cause the following error:
> {code:java}
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -6119185687804983867
>   at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:150)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> The error disappears in 8-jre-slim. It seems that Apache Commons Crypto 1.1.0 
> only works with Java 8: 
> [https://commons.apache.org/proper/commons-crypto/download_crypto.cgi]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39399) proxy-user not working for Spark on k8s in cluster deploy mode

2022-08-18 Thread Shrikant Prasad (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581495#comment-17581495
 ] 

Shrikant Prasad commented on SPARK-39399:
-

[~dongjoon] [~hyukjin.kwon] Can you please have a look at this issue and let me 
know if I need to add any more details in order to take this forward.

> proxy-user not working for Spark on k8s in cluster deploy mode
> --
>
> Key: SPARK-39399
> URL: https://issues.apache.org/jira/browse/SPARK-39399
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Priority: Major
>
> As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user 
> support was added for Spark on K8s. But the PR only added proxy user argument 
> on the spark-submit command. The actual functionality of authentication using 
> the proxy user is not working in case of cluster deploy mode.
> We get AccessControlException when trying to access the kerberized HDFS 
> through a proxy user. 
> Spark-Submit:
> $SPARK_HOME/bin/spark-submit \
> --master  \
> --deploy-mode cluster \
> --name with_proxy_user_di \
> --proxy-user  \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.kubernetes.container.image= \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.executor.instances=1 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.namespace= \
> --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
> --conf spark.eventLog.enabled=true \
> --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \
> --conf spark.kubernetes.file.upload.path=hdfs:///tmp \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar 
> Driver Logs:
> {code:java}
> ++ id -u
> + myuid=185
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 185
> + uidentry=
> + set -e
> + '[' -z '' ']'
> + '[' -w /etc/passwd ']'
> + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> + SPARK_CLASSPATH=':/opt/spark/jars/*'
> + env
> + grep SPARK_JAVA_OPT_
> + sort -t_ -k4 -n
> + sed 's/[^=]*=\(.*\)/\1/g'
> + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> + '[' -n '' ']'
> + '[' -z ']'
> + '[' -z ']'
> + '[' -n '' ']'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
> + case "$1" in
> + shift 1
> + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
> + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.SparkPi spark-internal
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor 
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful 
> kerberos logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos 
> logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, 
> valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field private 
> org.apache.hadoop.metrics2.lib.MutableGaugeLong 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal
>  with annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops",

[jira] [Commented] (SPARK-39993) Spark on Kubernetes doesn't filter data by date

2022-08-18 Thread Shrikant Prasad (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581492#comment-17581492
 ] 

Shrikant Prasad commented on SPARK-39993:
-

[~h.liashchuk] The code snippet you have shared is working fine in cluster 
deploy mode if we write to hdfs instead of s3. So, don't think there is any 
issue with k8s master. You might also first check what's the output of 
df.show() to see if the df contains the expected rows or not.

> Spark on Kubernetes doesn't filter data by date
> ---
>
> Key: SPARK-39993
> URL: https://issues.apache.org/jira/browse/SPARK-39993
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.2
> Environment: Kubernetes v1.23.6
> Spark 3.2.2
> Java 1.8.0_312
> Python 3.9.13
> Aws dependencies:
> aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
>Reporter: Hanna Liashchuk
>Priority: Major
>  Labels: kubernetes
>
> I'm creating a Dataset with type date and saving it into s3. When I read it 
> and try to use where() clause, I've noticed it doesn't return data even 
> though it's there
> Below is the code snippet I'm running
>  
> {code:java}
> from pyspark.sql.types import Row
> from pyspark.sql.functions import *
> ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
> col("date").cast("date"))
> ds.where("date = '2022-01-01'").show()
> ds.write.mode("overwrite").parquet("s3a://bucket/test")
> df = spark.read.format("parquet").load("s3a://bucket/test")
> df.where("date = '2022-01-01'").show()
> {code}
> The first show() returns data, while the second one - no.
> I've noticed that it's Kubernetes master related, as the same code snipped 
> works ok with master "local"
> UPD: if the column is used as a partition and has the type "date" there is no 
> filtering problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40136) Incorrect fragment of query context

2022-08-18 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40136.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37566
[https://github.com/apache/spark/pull/37566]

> Incorrect fragment of query context
> ---
>
> Key: SPARK-40136
> URL: https://issues.apache.org/jira/browse/SPARK-40136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> The query context contains just a part of fragment. The code below 
> demonstrates the issue:
> {code:scala}
> withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
>   val e = intercept[SparkArithmeticException] {
> sql("select 1 / 0").collect()
>   }
>   println("'" + e.getQueryContext()(0).fragment() + "'")
> }
> '1 / '
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40124.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37554
[https://github.com/apache/spark/pull/37554]

> Update TPCDS v1.4 q32 for Plan Stability tests
> --
>
> Key: SPARK-40124
> URL: https://issues.apache.org/jira/browse/SPARK-40124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kapil Singh
>Assignee: Kapil Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40124:
-

Assignee: Kapil Singh

> Update TPCDS v1.4 q32 for Plan Stability tests
> --
>
> Key: SPARK-40124
> URL: https://issues.apache.org/jira/browse/SPARK-40124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kapil Singh
>Assignee: Kapil Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581440#comment-17581440
 ] 

Apache Spark commented on SPARK-4:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/37572

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information

2022-08-18 Thread Yeachan Park (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581424#comment-17581424
 ] 

Yeachan Park commented on SPARK-27142:
--

Hi all, thanks a lot for working on this. I recently made a bug request 
regarding this story, https://issues.apache.org/jira/browse/SPARK-40140. Would 
you be able to take a look? Thanks!

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27142) Provide REST API for SQL level information

2022-08-18 Thread Yeachan Park (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581424#comment-17581424
 ] 

Yeachan Park edited comment on SPARK-27142 at 8/18/22 4:14 PM:
---

Hi all, thanks a lot for working on this. I recently made a bug report 
regarding this story, https://issues.apache.org/jira/browse/SPARK-40140. Would 
you be able to take a look? Thanks!


was (Author: JIRAUSER288356):
Hi all, thanks a lot for working on this. I recently made a bug request 
regarding this story, https://issues.apache.org/jira/browse/SPARK-40140. Would 
you be able to take a look? Thanks!

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40140) REST API for SQL level information does not show information on running queries

2022-08-18 Thread Yeachan Park (Jira)

Yeachan Park created SPARK-40140:


 Summary: REST API for SQL level information does not show 
information on running queries
 Key: SPARK-40140
 URL: https://issues.apache.org/jira/browse/SPARK-40140
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yeachan Park


Hi All,

We noticed that the SQL information REST API implemented in 
https://issues.apache.org/jira/browse/SPARK-27142 does not return back SQL 
queries which are currently running. We can only see queries which are 
completed/failed.

As far as I can see, this should be supported since one of the fields in the 
returned JSON is "runningJobIds". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-40139:
-
Description: 
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

  was:
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.



> Filter jobs by Job Group in UI & API
> 
>
> Key: SPARK-40139
> URL: https://issues.apache.org/jira/browse/SPARK-40139
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi all,
> We have some cases where it would be useful to be able to have a list of all 
> jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
> possible. We also noticed that in the REST API provided by spark, it's not 
> possible to filter in jobs by specific job groups.
> It would be great if we can have the functionality to filter on jobs based on 
> their job group id in the UI and in the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-40139:
-
Description: 
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.


  was:
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.




> Filter jobs by Job Group in UI & API
> 
>
> Key: SPARK-40139
> URL: https://issues.apache.org/jira/browse/SPARK-40139
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi all,
> We have some cases where it would be useful to be able to have a list of all 
> jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
> possible. We also noticed that in the REST API provided by spark, it's not 
> possible to filter in jobs by specific job groups.
> It would be great if we can have the functionality to filter on jobs based on 
> their job group id in the UI and in the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-40139:
-
Description: 
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you

  was:
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you,
Yeachan


> Filter jobs by Job Group in UI & API
> 
>
> Key: SPARK-40139
> URL: https://issues.apache.org/jira/browse/SPARK-40139
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi all,
> We have some cases where it would be useful to be able to have a list of all 
> jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
> possible. We also noticed that in the REST API provided by spark, it's not 
> possible to filter in jobs by specific job groups.
> It would be great if we can have the functionality to filter on jobs based on 
> their job group id in the UI and in the API.
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-40139:
-
Description: 
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.



  was:
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you


> Filter jobs by Job Group in UI & API
> 
>
> Key: SPARK-40139
> URL: https://issues.apache.org/jira/browse/SPARK-40139
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi all,
> We have some cases where it would be useful to be able to have a list of all 
> jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
> possible. We also noticed that in the REST API provided by spark, it's not 
> possible to filter in jobs by specific job groups.
> It would be great if we can have the functionality to filter on jobs based on 
> their job group id in the UI and in the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-40139:
-
Description: 
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter in jobs by specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you,
Yeachan

  was:
Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter on specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you,
Yeachan


> Filter jobs by Job Group in UI & API
> 
>
> Key: SPARK-40139
> URL: https://issues.apache.org/jira/browse/SPARK-40139
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi all,
> We have some cases where it would be useful to be able to have a list of all 
> jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
> possible. We also noticed that in the REST API provided by spark, it's not 
> possible to filter in jobs by specific job groups.
> It would be great if we can have the functionality to filter on jobs based on 
> their job group id in the UI and in the API.
> Thank you,
> Yeachan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40139) Filter jobs by Job Group in UI & API

2022-08-18 Thread Yeachan Park (Jira)

Yeachan Park created SPARK-40139:


 Summary: Filter jobs by Job Group in UI & API
 Key: SPARK-40139
 URL: https://issues.apache.org/jira/browse/SPARK-40139
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Yeachan Park


Hi all,

We have some cases where it would be useful to be able to have a list of all 
jobs belonging to the same job group in the Spark UI. This doesn't yet seem 
possible. We also noticed that in the REST API provided by spark, it's not 
possible to filter on specific job groups.

It would be great if we can have the functionality to filter on jobs based on 
their job group id in the UI and in the API.

Thank you,
Yeachan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB

2022-08-18 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-38909:
---

Assignee: Yang Jie

> Encapsulate LevelDB used by ExternalShuffleBlockResolver and 
> YarnShuffleService as LocalDB
> --
>
> Key: SPARK-38909
> URL: https://issues.apache.org/jira/browse/SPARK-38909
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}}  use {{{}LevelDB 
> directly{}}}, this is not conducive to extending the use of {{RocksDB}} in 
> this scenario. This pr is encapsulated for expansibility. It will be the 
> pre-work of SPARK-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB

2022-08-18 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-38909.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36200
[https://github.com/apache/spark/pull/36200]

> Encapsulate LevelDB used by ExternalShuffleBlockResolver and 
> YarnShuffleService as LocalDB
> --
>
> Key: SPARK-38909
> URL: https://issues.apache.org/jira/browse/SPARK-38909
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}}  use {{{}LevelDB 
> directly{}}}, this is not conducive to extending the use of {{RocksDB}} in 
> this scenario. This pr is encapsulated for expansibility. It will be the 
> pre-work of SPARK-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39975) Upgrade rocksdbjni to 7.4.5

2022-08-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39975.
--
Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37543

> Upgrade rocksdbjni to 7.4.5
> ---
>
> Key: SPARK-39975
> URL: https://issues.apache.org/jira/browse/SPARK-39975
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://github.com/facebook/rocksdb/releases/tag/v7.4.5]
>  
> {code:java}
> Fix a bug starting in 7.4.0 in which some fsync operations might be skipped 
> in a DB after any DropColumnFamily on that DB, until it is re-opened. This 
> can lead to data loss on power loss. (For custom FileSystem implementations, 
> this could lead to FSDirectory::Fsync or FSDirectory::Close after the first 
> FSDirectory::Close; Also, valgrind could report call to close() with fd=-1.) 
> {code}
>  
> Fixed a bug that caused data loss
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21487) WebUI-Executors Page results in "Request is a replay (34) attack"

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581363#comment-17581363
 ] 

Apache Spark commented on SPARK-21487:
--

User 'santosh-d3vpl3x' has created a pull request for this issue:
https://github.com/apache/spark/pull/37571

> WebUI-Executors Page results in "Request is a replay (34) attack"
> -
>
> Key: SPARK-21487
> URL: https://issues.apache.org/jira/browse/SPARK-21487
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.1
>Reporter: ShuMing Li
>Priority: Minor
>
> We upgraded Spark version from 2.0.2 to 2.1.1 recently,  WebUI `Executors 
> Page` becomed empty, with the exception below.
> `Executor Page` rendering using javascript language rather than scala in 
> 2.1.1, but I don't know why causes this result?
> "two queries are submitted at the same time and have the same timestamp may 
> cause this result", but I'm not sure?
> ResouceManager log:
> {code:java}
> 2017-07-20 20:39:09,371 WARN 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
> Authentication exception: GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> {code}
> Safari explorer console
> {code:java}
> Failed to load resource: the server responded with a status of 403 
> (GSSException: Failure unspecified at GSS-API level (Mechanism level: Request 
> is a replay 
> (34)))http://hadoop-rm-host:8088/proxy/application_1494564992156_2751285/static/executorspage-template.html
> {code}
> Related Links:
> https://issues.apache.org/jira/browse/HIVE-12481
> https://issues.apache.org/jira/browse/HADOOP-8830



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21487) WebUI-Executors Page results in "Request is a replay (34) attack"

2022-08-18 Thread Santosh Pingale (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581351#comment-17581351
 ] 

Santosh Pingale commented on SPARK-21487:
-

I believe this still an issue with kerberised hadoop clusters and sometimes, a 
major one when you have to debug something. This issue is present in all the 
secured hadoop clusters I have worked with. Finally at current org, I managed 
to get it working by patching spark internally.

*Whats happening:*

At some point, spark UI started sending mustache templates over AJAX call for 
rendering to improve performance. Those files have `.html` extension, here the 
yarn applies filter twice! 

*What could be done:*

While yarn is causing the issue here, spark can also make itself agnostic to 
this issue and allow users to use spark UI in its true form. In reality, 
mustache files should be `.mustache` instead of `.html`. This change alone 
allows us to render templates properly. I have tested it to work on local and 
on cluster. I can raise a PR and maybe we can discuss this over there.

> WebUI-Executors Page results in "Request is a replay (34) attack"
> -
>
> Key: SPARK-21487
> URL: https://issues.apache.org/jira/browse/SPARK-21487
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.1
>Reporter: ShuMing Li
>Priority: Minor
>
> We upgraded Spark version from 2.0.2 to 2.1.1 recently,  WebUI `Executors 
> Page` becomed empty, with the exception below.
> `Executor Page` rendering using javascript language rather than scala in 
> 2.1.1, but I don't know why causes this result?
> "two queries are submitted at the same time and have the same timestamp may 
> cause this result", but I'm not sure?
> ResouceManager log:
> {code:java}
> 2017-07-20 20:39:09,371 WARN 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
> Authentication exception: GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> {code}
> Safari explorer console
> {code:java}
> Failed to load resource: the server responded with a status of 403 
> (GSSException: Failure unspecified at GSS-API level (Mechanism level: Request 
> is a replay 
> (34)))http://hadoop-rm-host:8088/proxy/application_1494564992156_2751285/static/executorspage-template.html
> {code}
> Related Links:
> https://issues.apache.org/jira/browse/HIVE-12481
> https://issues.apache.org/jira/browse/HADOOP-8830



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40087) Support multiple Column drop in R

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40087.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37526
[https://github.com/apache/spark/pull/37526]

> Support multiple Column drop in R
> -
>
> Key: SPARK-40087
> URL: https://issues.apache.org/jira/browse/SPARK-40087
> Project: Spark
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Santosh Pingale
>Assignee: Santosh Pingale
>Priority: Minor
> Fix For: 3.4.0
>
>
> This is a followup on SPARK-39895. The PR previously attempted to adjust 
> implementation for R as well to match signatures but that part was removed 
> and we only focused on getting python implementation to behave correctly.
> *{{Change supports following operations:}}*
> {{df <- select(read.json(jsonPath), "name", "age")}}
> {{df$age2 <- df$age}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
> {{df1 <- drop(df, df$age, column("random"))}}
> {{expect_equal(columns(df1), c("name", "age2"))}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40087) Support multiple Column drop in R

2022-08-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40087:


Assignee: Santosh Pingale

> Support multiple Column drop in R
> -
>
> Key: SPARK-40087
> URL: https://issues.apache.org/jira/browse/SPARK-40087
> Project: Spark
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Santosh Pingale
>Assignee: Santosh Pingale
>Priority: Minor
>
> This is a followup on SPARK-39895. The PR previously attempted to adjust 
> implementation for R as well to match signatures but that part was removed 
> and we only focused on getting python implementation to behave correctly.
> *{{Change supports following operations:}}*
> {{df <- select(read.json(jsonPath), "name", "age")}}
> {{df$age2 <- df$age}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
> {{df1 <- drop(df, df$age, column("random"))}}
> {{expect_equal(columns(df1), c("name", "age2"))}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40120) Make pyspark.sql.readwriter examples self-contained

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40120:


Assignee: Apache Spark

> Make pyspark.sql.readwriter examples self-contained
> ---
>
> Key: SPARK-40120
> URL: https://issues.apache.org/jira/browse/SPARK-40120
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40120) Make pyspark.sql.readwriter examples self-contained

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40120:


Assignee: (was: Apache Spark)

> Make pyspark.sql.readwriter examples self-contained
> ---
>
> Key: SPARK-40120
> URL: https://issues.apache.org/jira/browse/SPARK-40120
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40120) Make pyspark.sql.readwriter examples self-contained

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581347#comment-17581347
 ] 

Apache Spark commented on SPARK-40120:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37570

> Make pyspark.sql.readwriter examples self-contained
> ---
>
> Key: SPARK-40120
> URL: https://issues.apache.org/jira/browse/SPARK-40120
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39993) Spark on Kubernetes doesn't filter data by date

2022-08-18 Thread Hanna Liashchuk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanna Liashchuk updated SPARK-39993:

Description: 
I'm creating a Dataset with type date and saving it into s3. When I read it and 
try to use where() clause, I've noticed it doesn't return data even though it's 
there

Below is the code snippet I'm running

 
{code:java}
from pyspark.sql.types import Row
from pyspark.sql.functions import *
ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
col("date").cast("date"))
ds.where("date = '2022-01-01'").show()
ds.write.mode("overwrite").parquet("s3a://bucket/test")
df = spark.read.format("parquet").load("s3a://bucket/test")
df.where("date = '2022-01-01'").show()
{code}
The first show() returns data, while the second one - no.

I've noticed that it's Kubernetes master related, as the same code snipped 
works ok with master "local"

UPD: if the column is used as a partition and has the type "date" there is no 
filtering problem.

 

 

  was:
I'm creating a Dataset with type date and saving it into s3. When I read it and 
try to use where() clause, I've noticed it doesn't return data even though it's 
there

Below is the code snippet I'm running

 
{code:java}
from pyspark.sql.types import Row
from pyspark.sql.functions import *
ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
col("date").cast("date"))
ds.where("date = '2022-01-01'").show()
ds.write.mode("overwrite").parquet("s3a://bucket/test")
df = spark.read.format("parquet").load("s3a://bucket/test")
df.where("date = '2022-01-01'").show()
{code}
The first show() returns data, while the second one - no.

I've noticed that it's Kubernetes master related, as the same code snipped 
works ok with master "local"

UPD: if the column is used as a partition and has the type "date" or is de 
facto date but has the type "string", there is no filtering problem.

 

 


> Spark on Kubernetes doesn't filter data by date
> ---
>
> Key: SPARK-39993
> URL: https://issues.apache.org/jira/browse/SPARK-39993
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.2
> Environment: Kubernetes v1.23.6
> Spark 3.2.2
> Java 1.8.0_312
> Python 3.9.13
> Aws dependencies:
> aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
>Reporter: Hanna Liashchuk
>Priority: Major
>  Labels: kubernetes
>
> I'm creating a Dataset with type date and saving it into s3. When I read it 
> and try to use where() clause, I've noticed it doesn't return data even 
> though it's there
> Below is the code snippet I'm running
>  
> {code:java}
> from pyspark.sql.types import Row
> from pyspark.sql.functions import *
> ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
> col("date").cast("date"))
> ds.where("date = '2022-01-01'").show()
> ds.write.mode("overwrite").parquet("s3a://bucket/test")
> df = spark.read.format("parquet").load("s3a://bucket/test")
> df.where("date = '2022-01-01'").show()
> {code}
> The first show() returns data, while the second one - no.
> I've noticed that it's Kubernetes master related, as the same code snipped 
> works ok with master "local"
> UPD: if the column is used as a partition and has the type "date" there is no 
> filtering problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40138) Implement DataFrame.mode

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40138:


Assignee: Apache Spark

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40138) Implement DataFrame.mode

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581315#comment-17581315
 ] 

Apache Spark commented on SPARK-40138:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37569

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40138) Implement DataFrame.mode

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581316#comment-17581316
 ] 

Apache Spark commented on SPARK-40138:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37569

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40138) Implement DataFrame.mode

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40138:


Assignee: (was: Apache Spark)

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40138) Implement DataFrame.mode

2022-08-18 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-40138:
-

 Summary: Implement DataFrame.mode
 Key: SPARK-40138
 URL: https://issues.apache.org/jira/browse/SPARK-40138
 Project: Spark
  Issue Type: Improvement
  Components: ps
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35542) Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35542:


Assignee: Weichen Xu  (was: Apache Spark)

> Bucketizer created for multiple columns with parameters splitsArray,  
> inputCols and outputCols can not be loaded after saving it.
> -
>
> Key: SPARK-35542
> URL: https://issues.apache.org/jira/browse/SPARK-35542
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
> Environment: {color:#172b4d}DataBricks Spark 3.1.1{color}
>Reporter: Srikanth Pusarla
>Assignee: Weichen Xu
>Priority: Minor
> Attachments: Code-error.PNG, traceback.png
>
>
> Bucketizer created for multiple columns with parameters *splitsArray*,  
> *inputCols* and *outputCols* can not be loaded after saving it.
> The problem is not seen for Bucketizer created for single column.
> *Code to reproduce* 
> ###
> from pyspark.ml.feature import Bucketizer
> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"])
> bucketizer = Bucketizer(*splitsArray*= [[-float("inf"), 0.5, 1.4, 
> float("inf")], [-float("inf"), 0.1, 1.2, float("inf")]], 
> *inputCols*=["values", "values"], *outputCols*=["b1", "b2"])
> bucketed = bucketizer.transform(df).collect()
> dfb = bucketizer.transform(df)
> print(dfb.show())
> bucketizerPath = "dbfs:/mnt/S3-Bucket/" + "Bucketizer"
> bucketizer.write().overwrite().save(bucketizerPath)
> loadedBucketizer = {color:#FF}Bucketizer.load(bucketizerPath)     
> Failing here{color}
> loadedBucketizer.getSplits() == bucketizer.getSplits()
> 
> The error message is 
> {color:#FF}*TypeError: array() argument 1 must be a unicode character, 
> not bytes*{color}
>  
> *BackTrace:*
>  
> --
> TypeError Traceback (most recent call last)  in  15 
> 16 bucketizer.write().overwrite().save(bucketizerPath)
> ---> 17 loadedBucketizer = Bucketizer.load(bucketizerPath)
> 18 loadedBucketizer.getSplits() == bucketizer.getSplits()
>  
> /databricks/spark/python/pyspark/ml/util.py in load(cls, path)
> 376 def load(cls, path):
> 377 """Reads an ML instance from the input path, a shortcut of 
> `read().load(path)`."""
> --> 378 return cls.read().load(path)
> 379
> 380
>  
> /databricks/spark/python/pyspark/ml/util.py in load(self, path)
> 330 raise NotImplementedError("This Java ML type cannot be loaded into Python 
> currently: %r"
> 331 % self._clazz)
> --> 332 return self._clazz._from_java(java_obj)
> 333
> 334
>  
> def session(self, sparkSession): 
> /databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
> 258
> 259 py_stage._resetUid(java_stage.uid())
> --> 260 py_stage._transfer_params_from_java()
> 261 elif hasattr(py_type, "_from_java"):
> 262 py_stage = py_type._from_java(java_stage)
>  
> /databricks/spark/python/pyspark/ml/wrapper.py in 
> _transfer_params_from_java(self)
> 186 # SPARK-14931: Only check set params back to avoid default params 
> mismatch.
> 187 if self._java_obj.isSet(java_param): --> 
> 188 value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> 189 self._set(**{param.name: value})
> 190 # SPARK-10931: Temporary fix for params that have a default in Java
>  
> /databricks/spark/python/pyspark/ml/common.py in _java2py(sc, r, encoding)
> 107
> 108 if isinstance(r, (bytearray, bytes)):
> --> 109 r = PickleSerializer().loads(bytes(r), encoding=encoding)
> 110  return r
> 111
>  
> /databricks/spark/python/pyspark/serializers.py in loads(self, obj, encoding)
> 467
> 468 def loads(self, obj, encoding="bytes"):
> --> 469 return pickle.loads(obj, encoding=encoding)
> 470
> 471
>  
> TypeError: array() argument 1 must be a unicode character, not bytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35542) Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581272#comment-17581272
 ] 

Apache Spark commented on SPARK-35542:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/37568

> Bucketizer created for multiple columns with parameters splitsArray,  
> inputCols and outputCols can not be loaded after saving it.
> -
>
> Key: SPARK-35542
> URL: https://issues.apache.org/jira/browse/SPARK-35542
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
> Environment: {color:#172b4d}DataBricks Spark 3.1.1{color}
>Reporter: Srikanth Pusarla
>Assignee: Weichen Xu
>Priority: Minor
> Attachments: Code-error.PNG, traceback.png
>
>
> Bucketizer created for multiple columns with parameters *splitsArray*,  
> *inputCols* and *outputCols* can not be loaded after saving it.
> The problem is not seen for Bucketizer created for single column.
> *Code to reproduce* 
> ###
> from pyspark.ml.feature import Bucketizer
> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"])
> bucketizer = Bucketizer(*splitsArray*= [[-float("inf"), 0.5, 1.4, 
> float("inf")], [-float("inf"), 0.1, 1.2, float("inf")]], 
> *inputCols*=["values", "values"], *outputCols*=["b1", "b2"])
> bucketed = bucketizer.transform(df).collect()
> dfb = bucketizer.transform(df)
> print(dfb.show())
> bucketizerPath = "dbfs:/mnt/S3-Bucket/" + "Bucketizer"
> bucketizer.write().overwrite().save(bucketizerPath)
> loadedBucketizer = {color:#FF}Bucketizer.load(bucketizerPath)     
> Failing here{color}
> loadedBucketizer.getSplits() == bucketizer.getSplits()
> 
> The error message is 
> {color:#FF}*TypeError: array() argument 1 must be a unicode character, 
> not bytes*{color}
>  
> *BackTrace:*
>  
> --
> TypeError Traceback (most recent call last)  in  15 
> 16 bucketizer.write().overwrite().save(bucketizerPath)
> ---> 17 loadedBucketizer = Bucketizer.load(bucketizerPath)
> 18 loadedBucketizer.getSplits() == bucketizer.getSplits()
>  
> /databricks/spark/python/pyspark/ml/util.py in load(cls, path)
> 376 def load(cls, path):
> 377 """Reads an ML instance from the input path, a shortcut of 
> `read().load(path)`."""
> --> 378 return cls.read().load(path)
> 379
> 380
>  
> /databricks/spark/python/pyspark/ml/util.py in load(self, path)
> 330 raise NotImplementedError("This Java ML type cannot be loaded into Python 
> currently: %r"
> 331 % self._clazz)
> --> 332 return self._clazz._from_java(java_obj)
> 333
> 334
>  
> def session(self, sparkSession): 
> /databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
> 258
> 259 py_stage._resetUid(java_stage.uid())
> --> 260 py_stage._transfer_params_from_java()
> 261 elif hasattr(py_type, "_from_java"):
> 262 py_stage = py_type._from_java(java_stage)
>  
> /databricks/spark/python/pyspark/ml/wrapper.py in 
> _transfer_params_from_java(self)
> 186 # SPARK-14931: Only check set params back to avoid default params 
> mismatch.
> 187 if self._java_obj.isSet(java_param): --> 
> 188 value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> 189 self._set(**{param.name: value})
> 190 # SPARK-10931: Temporary fix for params that have a default in Java
>  
> /databricks/spark/python/pyspark/ml/common.py in _java2py(sc, r, encoding)
> 107
> 108 if isinstance(r, (bytearray, bytes)):
> --> 109 r = PickleSerializer().loads(bytes(r), encoding=encoding)
> 110  return r
> 111
>  
> /databricks/spark/python/pyspark/serializers.py in loads(self, obj, encoding)
> 467
> 468 def loads(self, obj, encoding="bytes"):
> --> 469 return pickle.loads(obj, encoding=encoding)
> 470
> 471
>  
> TypeError: array() argument 1 must be a unicode character, not bytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35542) Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35542:


Assignee: Apache Spark  (was: Weichen Xu)

> Bucketizer created for multiple columns with parameters splitsArray,  
> inputCols and outputCols can not be loaded after saving it.
> -
>
> Key: SPARK-35542
> URL: https://issues.apache.org/jira/browse/SPARK-35542
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
> Environment: {color:#172b4d}DataBricks Spark 3.1.1{color}
>Reporter: Srikanth Pusarla
>Assignee: Apache Spark
>Priority: Minor
> Attachments: Code-error.PNG, traceback.png
>
>
> Bucketizer created for multiple columns with parameters *splitsArray*,  
> *inputCols* and *outputCols* can not be loaded after saving it.
> The problem is not seen for Bucketizer created for single column.
> *Code to reproduce* 
> ###
> from pyspark.ml.feature import Bucketizer
> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"])
> bucketizer = Bucketizer(*splitsArray*= [[-float("inf"), 0.5, 1.4, 
> float("inf")], [-float("inf"), 0.1, 1.2, float("inf")]], 
> *inputCols*=["values", "values"], *outputCols*=["b1", "b2"])
> bucketed = bucketizer.transform(df).collect()
> dfb = bucketizer.transform(df)
> print(dfb.show())
> bucketizerPath = "dbfs:/mnt/S3-Bucket/" + "Bucketizer"
> bucketizer.write().overwrite().save(bucketizerPath)
> loadedBucketizer = {color:#FF}Bucketizer.load(bucketizerPath)     
> Failing here{color}
> loadedBucketizer.getSplits() == bucketizer.getSplits()
> 
> The error message is 
> {color:#FF}*TypeError: array() argument 1 must be a unicode character, 
> not bytes*{color}
>  
> *BackTrace:*
>  
> --
> TypeError Traceback (most recent call last)  in  15 
> 16 bucketizer.write().overwrite().save(bucketizerPath)
> ---> 17 loadedBucketizer = Bucketizer.load(bucketizerPath)
> 18 loadedBucketizer.getSplits() == bucketizer.getSplits()
>  
> /databricks/spark/python/pyspark/ml/util.py in load(cls, path)
> 376 def load(cls, path):
> 377 """Reads an ML instance from the input path, a shortcut of 
> `read().load(path)`."""
> --> 378 return cls.read().load(path)
> 379
> 380
>  
> /databricks/spark/python/pyspark/ml/util.py in load(self, path)
> 330 raise NotImplementedError("This Java ML type cannot be loaded into Python 
> currently: %r"
> 331 % self._clazz)
> --> 332 return self._clazz._from_java(java_obj)
> 333
> 334
>  
> def session(self, sparkSession): 
> /databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
> 258
> 259 py_stage._resetUid(java_stage.uid())
> --> 260 py_stage._transfer_params_from_java()
> 261 elif hasattr(py_type, "_from_java"):
> 262 py_stage = py_type._from_java(java_stage)
>  
> /databricks/spark/python/pyspark/ml/wrapper.py in 
> _transfer_params_from_java(self)
> 186 # SPARK-14931: Only check set params back to avoid default params 
> mismatch.
> 187 if self._java_obj.isSet(java_param): --> 
> 188 value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> 189 self._set(**{param.name: value})
> 190 # SPARK-10931: Temporary fix for params that have a default in Java
>  
> /databricks/spark/python/pyspark/ml/common.py in _java2py(sc, r, encoding)
> 107
> 108 if isinstance(r, (bytearray, bytes)):
> --> 109 r = PickleSerializer().loads(bytes(r), encoding=encoding)
> 110  return r
> 111
>  
> /databricks/spark/python/pyspark/serializers.py in loads(self, obj, encoding)
> 467
> 468 def loads(self, obj, encoding="bytes"):
> --> 469 return pickle.loads(obj, encoding=encoding)
> 470
> 471
>  
> TypeError: array() argument 1 must be a unicode character, not bytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39271:


Assignee: Apache Spark

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39271:


Assignee: (was: Apache Spark)

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581266#comment-17581266
 ] 

Apache Spark commented on SPARK-39271:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37567

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39271) Upgrade pandas to 1.4.3

2022-08-18 Thread Yikun Jiang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-39271:

Summary: Upgrade pandas to 1.4.3  (was: Upgrade pandas to 1.4.2)

> Upgrade pandas to 1.4.3
> ---
>
> Key: SPARK-39271
> URL: https://issues.apache.org/jira/browse/SPARK-39271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35542) Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.

2022-08-18 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-35542:
--

Assignee: Weichen Xu

> Bucketizer created for multiple columns with parameters splitsArray,  
> inputCols and outputCols can not be loaded after saving it.
> -
>
> Key: SPARK-35542
> URL: https://issues.apache.org/jira/browse/SPARK-35542
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
> Environment: {color:#172b4d}DataBricks Spark 3.1.1{color}
>Reporter: Srikanth Pusarla
>Assignee: Weichen Xu
>Priority: Minor
> Attachments: Code-error.PNG, traceback.png
>
>
> Bucketizer created for multiple columns with parameters *splitsArray*,  
> *inputCols* and *outputCols* can not be loaded after saving it.
> The problem is not seen for Bucketizer created for single column.
> *Code to reproduce* 
> ###
> from pyspark.ml.feature import Bucketizer
> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"])
> bucketizer = Bucketizer(*splitsArray*= [[-float("inf"), 0.5, 1.4, 
> float("inf")], [-float("inf"), 0.1, 1.2, float("inf")]], 
> *inputCols*=["values", "values"], *outputCols*=["b1", "b2"])
> bucketed = bucketizer.transform(df).collect()
> dfb = bucketizer.transform(df)
> print(dfb.show())
> bucketizerPath = "dbfs:/mnt/S3-Bucket/" + "Bucketizer"
> bucketizer.write().overwrite().save(bucketizerPath)
> loadedBucketizer = {color:#FF}Bucketizer.load(bucketizerPath)     
> Failing here{color}
> loadedBucketizer.getSplits() == bucketizer.getSplits()
> 
> The error message is 
> {color:#FF}*TypeError: array() argument 1 must be a unicode character, 
> not bytes*{color}
>  
> *BackTrace:*
>  
> --
> TypeError Traceback (most recent call last)  in  15 
> 16 bucketizer.write().overwrite().save(bucketizerPath)
> ---> 17 loadedBucketizer = Bucketizer.load(bucketizerPath)
> 18 loadedBucketizer.getSplits() == bucketizer.getSplits()
>  
> /databricks/spark/python/pyspark/ml/util.py in load(cls, path)
> 376 def load(cls, path):
> 377 """Reads an ML instance from the input path, a shortcut of 
> `read().load(path)`."""
> --> 378 return cls.read().load(path)
> 379
> 380
>  
> /databricks/spark/python/pyspark/ml/util.py in load(self, path)
> 330 raise NotImplementedError("This Java ML type cannot be loaded into Python 
> currently: %r"
> 331 % self._clazz)
> --> 332 return self._clazz._from_java(java_obj)
> 333
> 334
>  
> def session(self, sparkSession): 
> /databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
> 258
> 259 py_stage._resetUid(java_stage.uid())
> --> 260 py_stage._transfer_params_from_java()
> 261 elif hasattr(py_type, "_from_java"):
> 262 py_stage = py_type._from_java(java_stage)
>  
> /databricks/spark/python/pyspark/ml/wrapper.py in 
> _transfer_params_from_java(self)
> 186 # SPARK-14931: Only check set params back to avoid default params 
> mismatch.
> 187 if self._java_obj.isSet(java_param): --> 
> 188 value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> 189 self._set(**{param.name: value})
> 190 # SPARK-10931: Temporary fix for params that have a default in Java
>  
> /databricks/spark/python/pyspark/ml/common.py in _java2py(sc, r, encoding)
> 107
> 108 if isinstance(r, (bytearray, bytes)):
> --> 109 r = PickleSerializer().loads(bytes(r), encoding=encoding)
> 110  return r
> 111
>  
> /databricks/spark/python/pyspark/serializers.py in loads(self, obj, encoding)
> 467
> 468 def loads(self, obj, encoding="bytes"):
> --> 469 return pickle.loads(obj, encoding=encoding)
> 470
> 471
>  
> TypeError: array() argument 1 must be a unicode character, not bytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40136) Incorrect fragment of query context

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581252#comment-17581252
 ] 

Apache Spark commented on SPARK-40136:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37566

> Incorrect fragment of query context
> ---
>
> Key: SPARK-40136
> URL: https://issues.apache.org/jira/browse/SPARK-40136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The query context contains just a part of fragment. The code below 
> demonstrates the issue:
> {code:scala}
> withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
>   val e = intercept[SparkArithmeticException] {
> sql("select 1 / 0").collect()
>   }
>   println("'" + e.getQueryContext()(0).fragment() + "'")
> }
> '1 / '
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40136) Incorrect fragment of query context

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40136:


Assignee: Max Gekk  (was: Apache Spark)

> Incorrect fragment of query context
> ---
>
> Key: SPARK-40136
> URL: https://issues.apache.org/jira/browse/SPARK-40136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The query context contains just a part of fragment. The code below 
> demonstrates the issue:
> {code:scala}
> withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
>   val e = intercept[SparkArithmeticException] {
> sql("select 1 / 0").collect()
>   }
>   println("'" + e.getQueryContext()(0).fragment() + "'")
> }
> '1 / '
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40136) Incorrect fragment of query context

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40136:


Assignee: Apache Spark  (was: Max Gekk)

> Incorrect fragment of query context
> ---
>
> Key: SPARK-40136
> URL: https://issues.apache.org/jira/browse/SPARK-40136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The query context contains just a part of fragment. The code below 
> demonstrates the issue:
> {code:scala}
> withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
>   val e = intercept[SparkArithmeticException] {
> sql("select 1 / 0").collect()
>   }
>   println("'" + e.getQueryContext()(0).fragment() + "'")
> }
> '1 / '
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40136) Incorrect fragment of query context

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581251#comment-17581251
 ] 

Apache Spark commented on SPARK-40136:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37566

> Incorrect fragment of query context
> ---
>
> Key: SPARK-40136
> URL: https://issues.apache.org/jira/browse/SPARK-40136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The query context contains just a part of fragment. The code below 
> demonstrates the issue:
> {code:scala}
> withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
>   val e = intercept[SparkArithmeticException] {
> sql("select 1 / 0").collect()
>   }
>   println("'" + e.getQueryContext()(0).fragment() + "'")
> }
> '1 / '
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40137) Combines limits after projection

2022-08-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40137:


Assignee: (was: Apache Spark)

> Combines limits after projection
> 
>
> Key: SPARK-40137
> URL: https://issues.apache.org/jira/browse/SPARK-40137
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: Xianyang Liu
>Priority: Major
>
> `Dataset.show` will add extra `Limit` and `Projection` on top of the given 
> logical plan. If the `Dataset` is already a limit job, this will introduce an 
> extra shuffle phase. So we should combine the limit after projection. 
> For example:
> ```scala
> spark.sql("select * from spark.store_sales limit 10").show()
> ```
> Before:
> ```
> == Physical Plan ==
> AdaptiveSparkPlan (12)
> +- == Final Plan ==
>* Project (7)
>+- * GlobalLimit (6)
>   +- ShuffleQueryStage (5), Statistics(sizeInBytes=185.6 KiB, 
> rowCount=990)
>  +- Exchange (4)
> +- * LocalLimit (3)
>+- * ColumnarToRow (2)
>   +- Scan parquet spark_catalog.spark.store_sales (1)
> +- == Initial Plan ==
>Project (11)
>+- GlobalLimit (10)
>   +- Exchange (9)
>  +- LocalLimit (8)
> +- Scan parquet spark_catalog.spark.store_sales (1)
> ```
> After:
> ```
> == Physical Plan ==
> CollectLimit (4)
> +- * Project (3)
>+- * ColumnarToRow (2)
>   +- Scan parquet spark_catalog.spark.store_sales (1)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40137) Combines limits after projection

2022-08-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581248#comment-17581248
 ] 

Apache Spark commented on SPARK-40137:
--

User 'ConeyLiu' has created a pull request for this issue:
https://github.com/apache/spark/pull/37565

> Combines limits after projection
> 
>
> Key: SPARK-40137
> URL: https://issues.apache.org/jira/browse/SPARK-40137
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: Xianyang Liu
>Priority: Major
>
> `Dataset.show` will add extra `Limit` and `Projection` on top of the given 
> logical plan. If the `Dataset` is already a limit job, this will introduce an 
> extra shuffle phase. So we should combine the limit after projection. 
> For example:
> ```scala
> spark.sql("select * from spark.store_sales limit 10").show()
> ```
> Before:
> ```
> == Physical Plan ==
> AdaptiveSparkPlan (12)
> +- == Final Plan ==
>* Project (7)
>+- * GlobalLimit (6)
>   +- ShuffleQueryStage (5), Statistics(sizeInBytes=185.6 KiB, 
> rowCount=990)
>  +- Exchange (4)
> +- * LocalLimit (3)
>+- * ColumnarToRow (2)
>   +- Scan parquet spark_catalog.spark.store_sales (1)
> +- == Initial Plan ==
>Project (11)
>+- GlobalLimit (10)
>   +- Exchange (9)
>  +- LocalLimit (8)
> +- Scan parquet spark_catalog.spark.store_sales (1)
> ```
> After:
> ```
> == Physical Plan ==
> CollectLimit (4)
> +- * Project (3)
>+- * ColumnarToRow (2)
>   +- Scan parquet spark_catalog.spark.store_sales (1)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo