[jira] [Assigned] (SPARK-42021) createDataFrame with array.array

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42021:


Assignee: (was: Apache Spark)

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42021) createDataFrame with array.array

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42021:


Assignee: Apache Spark

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42021) createDataFrame with array.array

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677210#comment-17677210
 ] 

Apache Spark commented on SPARK-42021:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39602

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677207#comment-17677207
 ] 

Apache Spark commented on SPARK-42087:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39601

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42087:


Assignee: Apache Spark

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677205#comment-17677205
 ] 

Apache Spark commented on SPARK-42087:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39601

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42087:


Assignee: (was: Apache Spark)

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-15 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42087:
-

 Summary: Use `--no-same-owner` when 
HiveExternalCatalogVersionsSuite untars.
 Key: SPARK-42087
 URL: https://issues.apache.org/jira/browse/SPARK-42087
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

2023-01-15 Thread Pralabh Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677198#comment-17677198
 ] 

Pralabh Kumar commented on SPARK-36728:
---

[~gurwls223] I think this can be closed , as its fixed part of 
 # SPARK-36742

> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> --
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  --  
> --  - 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  --  
> --  -      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 
> 30 25



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40353) Re-enable the `read_excel` tests

2023-01-15 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677193#comment-17677193
 ] 

Haejoon Lee commented on SPARK-40353:
-

Yeah, it should not a release blocker. Just addressed the `Proitiry` as `Minor`.

> Re-enable the `read_excel` tests
> 
>
> Key: SPARK-40353
> URL: https://issues.apache.org/jira/browse/SPARK-40353
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Minor
>
> So far, we've been skipping the `read_excel` test in pandas API on Spark:
> https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251
> In https://github.com/apache/spark/pull/37671, we installed 
> `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still 
> failing for some reason (Please see 
> https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more 
> detail).
> We should re-enable this test for improving the pandas-on-Spark test coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40353) Re-enable the `read_excel` tests

2023-01-15 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40353:

Priority: Minor  (was: Major)

> Re-enable the `read_excel` tests
> 
>
> Key: SPARK-40353
> URL: https://issues.apache.org/jira/browse/SPARK-40353
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Minor
>
> So far, we've been skipping the `read_excel` test in pandas API on Spark:
> https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251
> In https://github.com/apache/spark/pull/37671, we installed 
> `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still 
> failing for some reason (Please see 
> https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more 
> detail).
> We should re-enable this test for improving the pandas-on-Spark test coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41988:


Assignee: (was: Apache Spark)

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677190#comment-17677190
 ] 

Apache Spark commented on SPARK-41988:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39600

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41988:


Assignee: Apache Spark

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677192#comment-17677192
 ] 

Apache Spark commented on SPARK-41988:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39600

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42032) Map data show in different order

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42032:


Assignee: (was: Apache Spark)

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42032) Map data show in different order

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677189#comment-17677189
 ] 

Apache Spark commented on SPARK-42032:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39600

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42032) Map data show in different order

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42032:


Assignee: Apache Spark

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

2023-01-15 Thread Pralabh Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677187#comment-17677187
 ] 

Pralabh Kumar commented on SPARK-36728:
---

I think this issue is not reproducible on Spark 3.4. Please confirm 

> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> --
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  --  
> --  - 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  --  
> --  -      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 
> 30 25



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677185#comment-17677185
 ] 

Apache Spark commented on SPARK-42086:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39599

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable

2023-01-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42083.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39593
[https://github.com/apache/spark/pull/39593]

> Make (Executor|StatefulSet)PodsAllocator extendable
> ---
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable

2023-01-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42083:
-

Assignee: Dongjoon Hyun

> Make (Executor|StatefulSet)PodsAllocator extendable
> ---
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42086:


Assignee: (was: Apache Spark)

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42086:


Assignee: Apache Spark

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677184#comment-17677184
 ] 

Apache Spark commented on SPARK-42086:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39599

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-15 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42086:
-

 Summary: Sort test cases in SQLQueryTestSuite
 Key: SPARK-42086
 URL: https://issues.apache.org/jira/browse/SPARK-42086
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677176#comment-17677176
 ] 

Apache Spark commented on SPARK-41708:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39598

> Pull v1write information to WriteFiles
> --
>
> Key: SPARK-41708
> URL: https://issues.apache.org/jira/browse/SPARK-41708
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> Make WriteFiles hold v1 write information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677175#comment-17677175
 ] 

Apache Spark commented on SPARK-41708:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39598

> Pull v1write information to WriteFiles
> --
>
> Key: SPARK-41708
> URL: https://issues.apache.org/jira/browse/SPARK-41708
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> Make WriteFiles hold v1 write information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40353) Re-enable the `read_excel` tests

2023-01-15 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677171#comment-17677171
 ] 

Dongjoon Hyun commented on SPARK-40353:
---

According to the `Priority`, this is not a release blocker, right, [~itholic]? 
Also, cc [~XinrongM].

> Re-enable the `read_excel` tests
> 
>
> Key: SPARK-40353
> URL: https://issues.apache.org/jira/browse/SPARK-40353
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> So far, we've been skipping the `read_excel` test in pandas API on Spark:
> https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251
> In https://github.com/apache/spark/pull/37671, we installed 
> `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still 
> failing for some reason (Please see 
> https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more 
> detail).
> We should re-enable this test for improving the pandas-on-Spark test coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677165#comment-17677165
 ] 

Apache Spark commented on SPARK-41990:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/39597

> Filtering by composite field name like `field name` doesn't work with 
> pushDownPredicate = true
> --
>
> Key: SPARK-41990
> URL: https://issues.apache.org/jira/browse/SPARK-41990
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Marina Krasilnikova
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Suppose we have some table in postgresql with field `Last Name` The following 
> code results in error
> Dataset dataset = sparkSession.read()
> .format("jdbc")
> .option("url", myUrl)
> .option("dbtable", "myTable")
> .option("user", "myUser")
> .option("password", "muPassword")
> .load();
> dataset.where("`Last Name`='Tessel'").show();    //error
>  
>  
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5)
> == SQL ==
> Last Name
> -^^^
>     at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
>     at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143)
>     at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67)
>     at 
> org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40)
>     at 
> org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368)
>     at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278)
>  
> But if we set pushDownPredicate to false everything works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42084) Avoid leaking the qualified-access-only restriction

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42084:


Assignee: (was: Apache Spark)

> Avoid leaking the qualified-access-only restriction
> ---
>
> Key: SPARK-42084
> URL: https://issues.apache.org/jira/browse/SPARK-42084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42084) Avoid leaking the qualified-access-only restriction

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677164#comment-17677164
 ] 

Apache Spark commented on SPARK-42084:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39596

> Avoid leaking the qualified-access-only restriction
> ---
>
> Key: SPARK-42084
> URL: https://issues.apache.org/jira/browse/SPARK-42084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42084) Avoid leaking the qualified-access-only restriction

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677163#comment-17677163
 ] 

Apache Spark commented on SPARK-42084:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39596

> Avoid leaking the qualified-access-only restriction
> ---
>
> Key: SPARK-42084
> URL: https://issues.apache.org/jira/browse/SPARK-42084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42084) Avoid leaking the qualified-access-only restriction

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42084:


Assignee: Apache Spark

> Avoid leaking the qualified-access-only restriction
> ---
>
> Key: SPARK-42084
> URL: https://issues.apache.org/jira/browse/SPARK-42084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2023-01-15 Thread Xiaomin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677162#comment-17677162
 ] 

Xiaomin Zhang commented on SPARK-38230:
---

Hello [~coalchan] Thanks for working on this.  I created PR based on your work 
with some improvements as per [~Jackey Lee]'s comment. Now we don't need a new 
parameter and Spark will only invoke listPartitions for the case of overwriting 
hive static partitions.
[~roczei] Can you please review the PR and let me know if I missed anything? 
Thank you.

> InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions 
> in most cases
> ---
>
> Key: SPARK-38230
> URL: https://issues.apache.org/jira/browse/SPARK-38230
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: Coal Chan
>Priority: Major
>
> In 
> `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`,
>  `sparkSession.sessionState.catalog.listPartitions` will call method 
> `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore 
> client, this method will produce multiple queries per partition on hive 
> metastore db. So when you insert into a table which has too many 
> partitions(ie: 10k), it will produce too many queries on hive metastore 
> db(ie: n * 10k = 10nk), it puts a lot of strain on the database.
> In fact, it calls method `listPartitions` in order to get locations of 
> partitions and get `customPartitionLocations`. But in most cases, we do not 
> have custom partitions, we can just get partition names, so we can call 
> method listPartitionNames.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable

2023-01-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42083:
--
Summary: Make (Executor|StatefulSet)PodsAllocator extendable  (was: Make 
ExecutorPodsAllocator extendable)

> Make (Executor|StatefulSet)PodsAllocator extendable
> ---
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677159#comment-17677159
 ] 

Apache Spark commented on SPARK-38230:
--

User 'czxm' has created a pull request for this issue:
https://github.com/apache/spark/pull/39595

> InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions 
> in most cases
> ---
>
> Key: SPARK-38230
> URL: https://issues.apache.org/jira/browse/SPARK-38230
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: Coal Chan
>Priority: Major
>
> In 
> `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`,
>  `sparkSession.sessionState.catalog.listPartitions` will call method 
> `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore 
> client, this method will produce multiple queries per partition on hive 
> metastore db. So when you insert into a table which has too many 
> partitions(ie: 10k), it will produce too many queries on hive metastore 
> db(ie: n * 10k = 10nk), it puts a lot of strain on the database.
> In fact, it calls method `listPartitions` in order to get locations of 
> partitions and get `customPartitionLocations`. But in most cases, we do not 
> have custom partitions, we can just get partition names, so we can call 
> method listPartitionNames.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42085:


Assignee: (was: Apache Spark)

> Make `from_arrow_schema` support nested types
> -
>
> Key: SPARK-42085
> URL: https://issues.apache.org/jira/browse/SPARK-42085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42085:


Assignee: Apache Spark

> Make `from_arrow_schema` support nested types
> -
>
> Key: SPARK-42085
> URL: https://issues.apache.org/jira/browse/SPARK-42085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677158#comment-17677158
 ] 

Apache Spark commented on SPARK-42085:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39594

> Make `from_arrow_schema` support nested types
> -
>
> Key: SPARK-42085
> URL: https://issues.apache.org/jira/browse/SPARK-42085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42085:
-

 Summary: Make `from_arrow_schema` support nested types
 Key: SPARK-42085
 URL: https://issues.apache.org/jira/browse/SPARK-42085
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42084) Avoid leaking the qualified-access-only restriction

2023-01-15 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-42084:
---

 Summary: Avoid leaking the qualified-access-only restriction
 Key: SPARK-42084
 URL: https://issues.apache.org/jira/browse/SPARK-42084
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42083) Make ExecutorPodsAllocator extendable

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677156#comment-17677156
 ] 

Apache Spark commented on SPARK-42083:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39593

> Make ExecutorPodsAllocator extendable
> -
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42083) Make ExecutorPodsAllocator extendable

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677155#comment-17677155
 ] 

Apache Spark commented on SPARK-42083:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39593

> Make ExecutorPodsAllocator extendable
> -
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42083) Make ExecutorPodsAllocator extendable

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42083:


Assignee: Apache Spark

> Make ExecutorPodsAllocator extendable
> -
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42083) Make ExecutorPodsAllocator extendable

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42083:


Assignee: (was: Apache Spark)

> Make ExecutorPodsAllocator extendable
> -
>
> Key: SPARK-42083
> URL: https://issues.apache.org/jira/browse/SPARK-42083
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42081) improve the plan change validation

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677154#comment-17677154
 ] 

Apache Spark commented on SPARK-42081:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39592

> improve the plan change validation
> --
>
> Key: SPARK-42081
> URL: https://issues.apache.org/jira/browse/SPARK-42081
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42081) improve the plan change validation

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42081:


Assignee: Apache Spark

> improve the plan change validation
> --
>
> Key: SPARK-42081
> URL: https://issues.apache.org/jira/browse/SPARK-42081
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42081) improve the plan change validation

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42081:


Assignee: (was: Apache Spark)

> improve the plan change validation
> --
>
> Key: SPARK-42081
> URL: https://issues.apache.org/jira/browse/SPARK-42081
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42081) improve the plan change validation

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677152#comment-17677152
 ] 

Apache Spark commented on SPARK-42081:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39592

> improve the plan change validation
> --
>
> Key: SPARK-42081
> URL: https://issues.apache.org/jira/browse/SPARK-42081
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42083) Make ExecutorPodsAllocator extendable

2023-01-15 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42083:
-

 Summary: Make ExecutorPodsAllocator extendable
 Key: SPARK-42083
 URL: https://issues.apache.org/jira/browse/SPARK-42083
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42082) Migrate ValueError into PySparkValueError and manage the functions.py

2023-01-15 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677151#comment-17677151
 ] 

Haejoon Lee commented on SPARK-42082:
-

I'm working on this

> Migrate ValueError into PySparkValueError and manage the functions.py
> -
>
> Key: SPARK-42082
> URL: https://issues.apache.org/jira/browse/SPARK-42082
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all Python built-in Exception into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41712) Migrate the Spark Connect errors into error classes.

2023-01-15 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677150#comment-17677150
 ] 

Haejoon Lee commented on SPARK-41712:
-

I'm working on this

> Migrate the Spark Connect errors into error classes.
> 
>
> Key: SPARK-41712
> URL: https://issues.apache.org/jira/browse/SPARK-41712
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to migrate the Spark Connect errors into centralized error framework 
> by leveraging the error class logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42082) Migrate ValueError into PySparkValueError and manage the functions.py

2023-01-15 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42082:
---

 Summary: Migrate ValueError into PySparkValueError and manage the 
functions.py
 Key: SPARK-42082
 URL: https://issues.apache.org/jira/browse/SPARK-42082
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


We should migrate all Python built-in Exception into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42077:
-

Assignee: Ruifeng Zheng

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42077.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39588
[https://github.com/apache/spark/pull/39588]

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42080) Add guideline for PySpark errors.

2023-01-15 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677148#comment-17677148
 ] 

Haejoon Lee commented on SPARK-42080:
-

I'm working on this, we should introduce internal error classes first.

> Add guideline for PySpark errors.
> -
>
> Key: SPARK-42080
> URL: https://issues.apache.org/jira/browse/SPARK-42080
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Add guideline for PySpark errores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42081) improve the plan change validation

2023-01-15 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-42081:
---

 Summary: improve the plan change validation
 Key: SPARK-42081
 URL: https://issues.apache.org/jira/browse/SPARK-42081
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42076:
-

Assignee: Ruifeng Zheng

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42076.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39587
[https://github.com/apache/spark/pull/39587]

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42080) Add guideline for PySpark errors.

2023-01-15 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42080:
---

 Summary: Add guideline for PySpark errors.
 Key: SPARK-42080
 URL: https://issues.apache.org/jira/browse/SPARK-42080
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


Add guideline for PySpark errores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677144#comment-17677144
 ] 

Apache Spark commented on SPARK-42078:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39591

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677143#comment-17677143
 ] 

Apache Spark commented on SPARK-42078:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39591

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42078:


Assignee: Apache Spark

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42078:


Assignee: (was: Apache Spark)

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42032) Map data show in different order

2023-01-15 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677142#comment-17677142
 ] 

Ruifeng Zheng commented on SPARK-42032:
---

Have an offline discussion with [~beliefer], Spark Connect has the same 
behavior as Scala Dataset API, which is different from PySpark in some cases:

Dataset API

{code:java}
scala> spark.createDataFrame(Seq((1, Map("foo" -> -2.0, "bar" -> 
2.0.show(100, 100)
+---+-+
| _1|   _2|
+---+-+
|  1|{foo -> -2.0, bar -> 2.0}|
+---+-+
{code}


PySpark:

{code:java}
In [2]: spark.createDataFrame([(1, {"foo": -2.0, "bar": 2.0})]).show(100, 100)
+---+-+ 
| _1|   _2|
+---+-+
|  1|{bar -> 2.0, foo -> -2.0}|
+---+-+

{code}



this should not be a bug in Connect.

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-15 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677141#comment-17677141
 ] 

jiaan.geng commented on SPARK-41988:


After my investigation, the fact is the result of connect is the same as 
Dataset API. This is a bug of pyspark.

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677140#comment-17677140
 ] 

Apache Spark commented on SPARK-42079:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39590

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42079:


Assignee: Apache Spark

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42079:


Assignee: (was: Apache Spark)

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42079) Rename proto messages for `toDF` and `WithColumnsRenamed`

2023-01-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42079:
-

 Summary: Rename proto messages for `toDF` and `WithColumnsRenamed`
 Key: SPARK-42079
 URL: https://issues.apache.org/jira/browse/SPARK-42079
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-42079:
--
Summary: Rename proto messages for `toDF` and `withColumnsRenamed`  (was: 
Rename proto messages for `toDF` and `WithColumnsRenamed`)

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42078:

Summary: Migrate errors thrown by JVM into PySpark Exception.  (was: 
Migrate errors thrown by JVM into single file.)

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.

2023-01-15 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677139#comment-17677139
 ] 

Haejoon Lee commented on SPARK-42078:
-

I'm working on it

> Migrate errors thrown by JVM into PySpark Exception.
> 
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42078) Migrate errors thrown by JVM into single file.

2023-01-15 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42078:

Summary: Migrate errors thrown by JVM into single file.  (was: Migrate 
errors thrown by JVM into PySparkException.)

> Migrate errors thrown by JVM into single file.
> --
>
> Key: SPARK-42078
> URL: https://issues.apache.org/jira/browse/SPARK-42078
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42078) Migrate errors thrown by JVM into PySparkException.

2023-01-15 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42078:
---

 Summary: Migrate errors thrown by JVM into PySparkException.
 Key: SPARK-42078
 URL: https://issues.apache.org/jira/browse/SPARK-42078
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


We should migrate all exceptions generated on PySpark into PySparkException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677131#comment-17677131
 ] 

Apache Spark commented on SPARK-42077:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39588

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42077:


Assignee: Apache Spark

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677129#comment-17677129
 ] 

Apache Spark commented on SPARK-42077:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39588

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42077:


Assignee: (was: Apache Spark)

> Literal should throw TypeError for unsupported DataType
> ---
>
> Key: SPARK-42077
> URL: https://issues.apache.org/jira/browse/SPARK-42077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42077) Literal should throw TypeError for unsupported DataType

2023-01-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42077:
-

 Summary: Literal should throw TypeError for unsupported DataType
 Key: SPARK-42077
 URL: https://issues.apache.org/jira/browse/SPARK-42077
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677127#comment-17677127
 ] 

Apache Spark commented on SPARK-42076:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39587

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677126#comment-17677126
 ] 

Apache Spark commented on SPARK-42076:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39587

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42076:


Assignee: (was: Apache Spark)

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42076:


Assignee: Apache Spark

> Factor data conversion `arrow -> rows` out to `conversion.py`
> -
>
> Key: SPARK-42076
> URL: https://issues.apache.org/jira/browse/SPARK-42076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42076:
-

 Summary: Factor data conversion `arrow -> rows` out to 
`conversion.py`
 Key: SPARK-42076
 URL: https://issues.apache.org/jira/browse/SPARK-42076
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42075) Deprecate DStream API

2023-01-15 Thread Chaoqin Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677114#comment-17677114
 ] 

Chaoqin Li commented on SPARK-42075:


[~kabhwan] , sure, I can take this. Thanks!

> Deprecate DStream API
> -
>
> Key: SPARK-42075
> URL: https://issues.apache.org/jira/browse/SPARK-42075
> Project: Spark
>  Issue Type: Task
>  Components: DStreams
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>
> We've got consensus to deprecate DStream API in 3.4.0.
> [https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x]
> This Jira ticket is to track the effort of action items:
>  * Add "deprecation" annotation to the user-facing public API in streaming 
> directory (DStream)
>  * Write a release note to explicitly mention the deprecation. (Maybe promote 
> again that they are encouraged to move to SS.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42032) Map data show in different order

2023-01-15 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677106#comment-17677106
 ] 

jiaan.geng commented on SPARK-42032:


This issue duplicated with https://issues.apache.org/jira/browse/SPARK-41988
I'm doing now.

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41586:


Assignee: Haejoon Lee

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41586.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39387
[https://github.com/apache/spark/pull/39387]

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41903) Support data type ndarray

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41903:


Assignee: Ruifeng Zheng  (was: Sandeep Singh)

> Support data type ndarray
> -
>
> Key: SPARK-41903
> URL: https://issues.apache.org/jira/browse/SPARK-41903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> import numpy as np
> arr_dtype_to_spark_dtypes = [
> ("int8", [("b", "array")]),
> ("int16", [("b", "array")]),
> ("int32", [("b", "array")]),
> ("int64", [("b", "array")]),
> ("float32", [("b", "array")]),
> ("float64", [("b", "array")]),
> ]
> for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes:
> arr = np.array([1, 2]).astype(t)
> self.assertEqual(
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
> )
> arr = np.array([1, 2]).astype(np.uint)
> with self.assertRaisesRegex(
> TypeError, "The type of array scalar '%s' is not supported" % arr.dtype
> ):
> self.spark.range(1).select(lit(arr).alias("b")){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 1100, in test_ndarray_input
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
>   File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line 
> 332, in wrapped
> return getattr(functions, f.__name__)(*args, **kwargs)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 198, in lit
> return Column(LiteralExpression._from_value(col))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 266, in _from_value
> return LiteralExpression(value=value, 
> dataType=LiteralExpression._infer_type(value))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 262, in _infer_type
> raise ValueError(f"Unsupported Data Type {type(value).__name__}")
> ValueError: Unsupported Data Type ndarray {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41903) Support data type ndarray

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41903.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39570
[https://github.com/apache/spark/pull/39570]

> Support data type ndarray
> -
>
> Key: SPARK-41903
> URL: https://issues.apache.org/jira/browse/SPARK-41903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> import numpy as np
> arr_dtype_to_spark_dtypes = [
> ("int8", [("b", "array")]),
> ("int16", [("b", "array")]),
> ("int32", [("b", "array")]),
> ("int64", [("b", "array")]),
> ("float32", [("b", "array")]),
> ("float64", [("b", "array")]),
> ]
> for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes:
> arr = np.array([1, 2]).astype(t)
> self.assertEqual(
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
> )
> arr = np.array([1, 2]).astype(np.uint)
> with self.assertRaisesRegex(
> TypeError, "The type of array scalar '%s' is not supported" % arr.dtype
> ):
> self.spark.range(1).select(lit(arr).alias("b")){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 1100, in test_ndarray_input
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
>   File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line 
> 332, in wrapped
> return getattr(functions, f.__name__)(*args, **kwargs)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 198, in lit
> return Column(LiteralExpression._from_value(col))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 266, in _from_value
> return LiteralExpression(value=value, 
> dataType=LiteralExpression._infer_type(value))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 262, in _infer_type
> raise ValueError(f"Unsupported Data Type {type(value).__name__}")
> ValueError: Unsupported Data Type ndarray {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41903) Support data type ndarray

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41903:


Assignee: Sandeep Singh

> Support data type ndarray
> -
>
> Key: SPARK-41903
> URL: https://issues.apache.org/jira/browse/SPARK-41903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Major
>
> {code:java}
> import numpy as np
> arr_dtype_to_spark_dtypes = [
> ("int8", [("b", "array")]),
> ("int16", [("b", "array")]),
> ("int32", [("b", "array")]),
> ("int64", [("b", "array")]),
> ("float32", [("b", "array")]),
> ("float64", [("b", "array")]),
> ]
> for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes:
> arr = np.array([1, 2]).astype(t)
> self.assertEqual(
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
> )
> arr = np.array([1, 2]).astype(np.uint)
> with self.assertRaisesRegex(
> TypeError, "The type of array scalar '%s' is not supported" % arr.dtype
> ):
> self.spark.range(1).select(lit(arr).alias("b")){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 1100, in test_ndarray_input
> expected_spark_dtypes, 
> self.spark.range(1).select(lit(arr).alias("b")).dtypes
>   File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line 
> 332, in wrapped
> return getattr(functions, f.__name__)(*args, **kwargs)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 198, in lit
> return Column(LiteralExpression._from_value(col))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 266, in _from_value
> return LiteralExpression(value=value, 
> dataType=LiteralExpression._infer_type(value))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py",
>  line 262, in _infer_type
> raise ValueError(f"Unsupported Data Type {type(value).__name__}")
> ValueError: Unsupported Data Type ndarray {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42073) Enable pyspark.sql.tests.test_types 2 test cases

2023-01-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42073.
--
Resolution: Fixed

Issue resolved by pull request 39583
[https://github.com/apache/spark/pull/39583]

> Enable pyspark.sql.tests.test_types 2 test cases
> 
>
> Key: SPARK-42073
> URL: https://issues.apache.org/jira/browse/SPARK-42073
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration

2023-01-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42074.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39584
[https://github.com/apache/spark/pull/39584]

> Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
> --
>
> Key: SPARK-42074
> URL: https://issues.apache.org/jira/browse/SPARK-42074
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration

2023-01-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42074:
-

Assignee: Dongjoon Hyun

> Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
> --
>
> Key: SPARK-42074
> URL: https://issues.apache.org/jira/browse/SPARK-42074
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-01-15 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677094#comment-17677094
 ] 

Sandeep Singh commented on SPARK-42002:
---

I'm working on this

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42075) Deprecate DStream API

2023-01-15 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677093#comment-17677093
 ] 

Jungtaek Lim commented on SPARK-42075:
--

[~Chaoqin] Could you please take this over? Thanks in advance!

> Deprecate DStream API
> -
>
> Key: SPARK-42075
> URL: https://issues.apache.org/jira/browse/SPARK-42075
> Project: Spark
>  Issue Type: Task
>  Components: DStreams
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>
> We've got consensus to deprecate DStream API in 3.4.0.
> [https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x]
> This Jira ticket is to track the effort of action items:
>  * Add "deprecation" annotation to the user-facing public API in streaming 
> directory (DStream)
>  * Write a release note to explicitly mention the deprecation. (Maybe promote 
> again that they are encouraged to move to SS.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42075) Deprecate DStream API

2023-01-15 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-42075:


 Summary: Deprecate DStream API
 Key: SPARK-42075
 URL: https://issues.apache.org/jira/browse/SPARK-42075
 Project: Spark
  Issue Type: Task
  Components: DStreams
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


We've got consensus to deprecate DStream API in 3.4.0.

[https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x]

This Jira ticket is to track the effort of action items:
 * Add "deprecation" annotation to the user-facing public API in streaming 
directory (DStream)
 * Write a release note to explicitly mention the deprecation. (Maybe promote 
again that they are encouraged to move to SS.)

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration

2023-01-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677089#comment-17677089
 ] 

Apache Spark commented on SPARK-42074:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39584

> Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
> --
>
> Key: SPARK-42074
> URL: https://issues.apache.org/jira/browse/SPARK-42074
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >