[jira] [Assigned] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42021: Assignee: (was: Apache Spark) > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42021: Assignee: Apache Spark > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677210#comment-17677210 ] Apache Spark commented on SPARK-42021: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39602 > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677207#comment-17677207 ] Apache Spark commented on SPARK-42087: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39601 > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42087: Assignee: Apache Spark > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677205#comment-17677205 ] Apache Spark commented on SPARK-42087: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39601 > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42087: Assignee: (was: Apache Spark) > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
Dongjoon Hyun created SPARK-42087: - Summary: Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. Key: SPARK-42087 URL: https://issues.apache.org/jira/browse/SPARK-42087 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas
[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677198#comment-17677198 ] Pralabh Kumar commented on SPARK-36728: --- [~gurwls223] I think this can be closed , as its fixed part of # SPARK-36742 > Can't create datetime object from anything other then year column Pyspark - > koalas > -- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: pyspark_date.txt, pyspark_date2.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- -- > -- - 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- -- > -- - 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) > KeyError: "['testyear'] not in index" > df_test > testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 > 30 25 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40353) Re-enable the `read_excel` tests
[ https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677193#comment-17677193 ] Haejoon Lee commented on SPARK-40353: - Yeah, it should not a release blocker. Just addressed the `Proitiry` as `Minor`. > Re-enable the `read_excel` tests > > > Key: SPARK-40353 > URL: https://issues.apache.org/jira/browse/SPARK-40353 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Minor > > So far, we've been skipping the `read_excel` test in pandas API on Spark: > https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251 > In https://github.com/apache/spark/pull/37671, we installed > `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still > failing for some reason (Please see > https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more > detail). > We should re-enable this test for improving the pandas-on-Spark test coverage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40353) Re-enable the `read_excel` tests
[ https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-40353: Priority: Minor (was: Major) > Re-enable the `read_excel` tests > > > Key: SPARK-40353 > URL: https://issues.apache.org/jira/browse/SPARK-40353 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Minor > > So far, we've been skipping the `read_excel` test in pandas API on Spark: > https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251 > In https://github.com/apache/spark/pull/37671, we installed > `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still > failing for some reason (Please see > https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more > detail). > We should re-enable this test for improving the pandas-on-Spark test coverage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41988: Assignee: (was: Apache Spark) > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677190#comment-17677190 ] Apache Spark commented on SPARK-41988: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39600 > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41988: Assignee: Apache Spark > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677192#comment-17677192 ] Apache Spark commented on SPARK-41988: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39600 > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42032: Assignee: (was: Apache Spark) > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677189#comment-17677189 ] Apache Spark commented on SPARK-42032: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39600 > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42032: Assignee: Apache Spark > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas
[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677187#comment-17677187 ] Pralabh Kumar commented on SPARK-36728: --- I think this issue is not reproducible on Spark 3.4. Please confirm > Can't create datetime object from anything other then year column Pyspark - > koalas > -- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: pyspark_date.txt, pyspark_date2.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- -- > -- - 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- -- > -- - 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) > KeyError: "['testyear'] not in index" > df_test > testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 > 30 25 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677185#comment-17677185 ] Apache Spark commented on SPARK-42086: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39599 > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42083. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39593 [https://github.com/apache/spark/pull/39593] > Make (Executor|StatefulSet)PodsAllocator extendable > --- > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42083: - Assignee: Dongjoon Hyun > Make (Executor|StatefulSet)PodsAllocator extendable > --- > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42086: Assignee: (was: Apache Spark) > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42086: Assignee: Apache Spark > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677184#comment-17677184 ] Apache Spark commented on SPARK-42086: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39599 > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42086) Sort test cases in SQLQueryTestSuite
Dongjoon Hyun created SPARK-42086: - Summary: Sort test cases in SQLQueryTestSuite Key: SPARK-42086 URL: https://issues.apache.org/jira/browse/SPARK-42086 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677176#comment-17677176 ] Apache Spark commented on SPARK-41708: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39598 > Pull v1write information to WriteFiles > -- > > Key: SPARK-41708 > URL: https://issues.apache.org/jira/browse/SPARK-41708 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > Make WriteFiles hold v1 write information -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677175#comment-17677175 ] Apache Spark commented on SPARK-41708: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39598 > Pull v1write information to WriteFiles > -- > > Key: SPARK-41708 > URL: https://issues.apache.org/jira/browse/SPARK-41708 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > Make WriteFiles hold v1 write information -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40353) Re-enable the `read_excel` tests
[ https://issues.apache.org/jira/browse/SPARK-40353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677171#comment-17677171 ] Dongjoon Hyun commented on SPARK-40353: --- According to the `Priority`, this is not a release blocker, right, [~itholic]? Also, cc [~XinrongM]. > Re-enable the `read_excel` tests > > > Key: SPARK-40353 > URL: https://issues.apache.org/jira/browse/SPARK-40353 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > So far, we've been skipping the `read_excel` test in pandas API on Spark: > https://github.com/apache/spark/blob/6d2ce128058b439094cd1dd54253372af6977e79/python/pyspark/pandas/tests/test_dataframe_spark_io.py#L251 > In https://github.com/apache/spark/pull/37671, we installed > `openpyxl==3.0.10` to re-enable the `read_excel` tests, but it's still > failing for some reason (Please see > https://github.com/apache/spark/pull/37671#issuecomment-1237515485 for more > detail). > We should re-enable this test for improving the pandas-on-Spark test coverage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true
[ https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677165#comment-17677165 ] Apache Spark commented on SPARK-41990: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/39597 > Filtering by composite field name like `field name` doesn't work with > pushDownPredicate = true > -- > > Key: SPARK-41990 > URL: https://issues.apache.org/jira/browse/SPARK-41990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1 >Reporter: Marina Krasilnikova >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > > Suppose we have some table in postgresql with field `Last Name` The following > code results in error > Dataset dataset = sparkSession.read() > .format("jdbc") > .option("url", myUrl) > .option("dbtable", "myTable") > .option("user", "myUser") > .option("password", "muPassword") > .load(); > dataset.where("`Last Name`='Tessel'").show(); //error > > > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5) > == SQL == > Last Name > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67) > at > org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40) > at > org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368) > at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278) > > But if we set pushDownPredicate to false everything works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42084) Avoid leaking the qualified-access-only restriction
[ https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42084: Assignee: (was: Apache Spark) > Avoid leaking the qualified-access-only restriction > --- > > Key: SPARK-42084 > URL: https://issues.apache.org/jira/browse/SPARK-42084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42084) Avoid leaking the qualified-access-only restriction
[ https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677164#comment-17677164 ] Apache Spark commented on SPARK-42084: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39596 > Avoid leaking the qualified-access-only restriction > --- > > Key: SPARK-42084 > URL: https://issues.apache.org/jira/browse/SPARK-42084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42084) Avoid leaking the qualified-access-only restriction
[ https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677163#comment-17677163 ] Apache Spark commented on SPARK-42084: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39596 > Avoid leaking the qualified-access-only restriction > --- > > Key: SPARK-42084 > URL: https://issues.apache.org/jira/browse/SPARK-42084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42084) Avoid leaking the qualified-access-only restriction
[ https://issues.apache.org/jira/browse/SPARK-42084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42084: Assignee: Apache Spark > Avoid leaking the qualified-access-only restriction > --- > > Key: SPARK-42084 > URL: https://issues.apache.org/jira/browse/SPARK-42084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases
[ https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677162#comment-17677162 ] Xiaomin Zhang commented on SPARK-38230: --- Hello [~coalchan] Thanks for working on this. I created PR based on your work with some improvements as per [~Jackey Lee]'s comment. Now we don't need a new parameter and Spark will only invoke listPartitions for the case of overwriting hive static partitions. [~roczei] Can you please review the PR and let me know if I missed anything? Thank you. > InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions > in most cases > --- > > Key: SPARK-38230 > URL: https://issues.apache.org/jira/browse/SPARK-38230 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2 >Reporter: Coal Chan >Priority: Major > > In > `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`, > `sparkSession.sessionState.catalog.listPartitions` will call method > `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore > client, this method will produce multiple queries per partition on hive > metastore db. So when you insert into a table which has too many > partitions(ie: 10k), it will produce too many queries on hive metastore > db(ie: n * 10k = 10nk), it puts a lot of strain on the database. > In fact, it calls method `listPartitions` in order to get locations of > partitions and get `customPartitionLocations`. But in most cases, we do not > have custom partitions, we can just get partition names, so we can call > method listPartitionNames. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42083) Make (Executor|StatefulSet)PodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42083: -- Summary: Make (Executor|StatefulSet)PodsAllocator extendable (was: Make ExecutorPodsAllocator extendable) > Make (Executor|StatefulSet)PodsAllocator extendable > --- > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases
[ https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677159#comment-17677159 ] Apache Spark commented on SPARK-38230: -- User 'czxm' has created a pull request for this issue: https://github.com/apache/spark/pull/39595 > InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions > in most cases > --- > > Key: SPARK-38230 > URL: https://issues.apache.org/jira/browse/SPARK-38230 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2 >Reporter: Coal Chan >Priority: Major > > In > `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`, > `sparkSession.sessionState.catalog.listPartitions` will call method > `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore > client, this method will produce multiple queries per partition on hive > metastore db. So when you insert into a table which has too many > partitions(ie: 10k), it will produce too many queries on hive metastore > db(ie: n * 10k = 10nk), it puts a lot of strain on the database. > In fact, it calls method `listPartitions` in order to get locations of > partitions and get `customPartitionLocations`. But in most cases, we do not > have custom partitions, we can just get partition names, so we can call > method listPartitionNames. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types
[ https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42085: Assignee: (was: Apache Spark) > Make `from_arrow_schema` support nested types > - > > Key: SPARK-42085 > URL: https://issues.apache.org/jira/browse/SPARK-42085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types
[ https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42085: Assignee: Apache Spark > Make `from_arrow_schema` support nested types > - > > Key: SPARK-42085 > URL: https://issues.apache.org/jira/browse/SPARK-42085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42085) Make `from_arrow_schema` support nested types
[ https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677158#comment-17677158 ] Apache Spark commented on SPARK-42085: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39594 > Make `from_arrow_schema` support nested types > - > > Key: SPARK-42085 > URL: https://issues.apache.org/jira/browse/SPARK-42085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42085) Make `from_arrow_schema` support nested types
Ruifeng Zheng created SPARK-42085: - Summary: Make `from_arrow_schema` support nested types Key: SPARK-42085 URL: https://issues.apache.org/jira/browse/SPARK-42085 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42084) Avoid leaking the qualified-access-only restriction
Wenchen Fan created SPARK-42084: --- Summary: Avoid leaking the qualified-access-only restriction Key: SPARK-42084 URL: https://issues.apache.org/jira/browse/SPARK-42084 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42083) Make ExecutorPodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677156#comment-17677156 ] Apache Spark commented on SPARK-42083: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39593 > Make ExecutorPodsAllocator extendable > - > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42083) Make ExecutorPodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677155#comment-17677155 ] Apache Spark commented on SPARK-42083: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39593 > Make ExecutorPodsAllocator extendable > - > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42083) Make ExecutorPodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42083: Assignee: Apache Spark > Make ExecutorPodsAllocator extendable > - > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42083) Make ExecutorPodsAllocator extendable
[ https://issues.apache.org/jira/browse/SPARK-42083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42083: Assignee: (was: Apache Spark) > Make ExecutorPodsAllocator extendable > - > > Key: SPARK-42083 > URL: https://issues.apache.org/jira/browse/SPARK-42083 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42081) improve the plan change validation
[ https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677154#comment-17677154 ] Apache Spark commented on SPARK-42081: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39592 > improve the plan change validation > -- > > Key: SPARK-42081 > URL: https://issues.apache.org/jira/browse/SPARK-42081 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42081) improve the plan change validation
[ https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42081: Assignee: Apache Spark > improve the plan change validation > -- > > Key: SPARK-42081 > URL: https://issues.apache.org/jira/browse/SPARK-42081 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42081) improve the plan change validation
[ https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42081: Assignee: (was: Apache Spark) > improve the plan change validation > -- > > Key: SPARK-42081 > URL: https://issues.apache.org/jira/browse/SPARK-42081 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42081) improve the plan change validation
[ https://issues.apache.org/jira/browse/SPARK-42081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677152#comment-17677152 ] Apache Spark commented on SPARK-42081: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39592 > improve the plan change validation > -- > > Key: SPARK-42081 > URL: https://issues.apache.org/jira/browse/SPARK-42081 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42083) Make ExecutorPodsAllocator extendable
Dongjoon Hyun created SPARK-42083: - Summary: Make ExecutorPodsAllocator extendable Key: SPARK-42083 URL: https://issues.apache.org/jira/browse/SPARK-42083 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42082) Migrate ValueError into PySparkValueError and manage the functions.py
[ https://issues.apache.org/jira/browse/SPARK-42082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677151#comment-17677151 ] Haejoon Lee commented on SPARK-42082: - I'm working on this > Migrate ValueError into PySparkValueError and manage the functions.py > - > > Key: SPARK-42082 > URL: https://issues.apache.org/jira/browse/SPARK-42082 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all Python built-in Exception into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41712) Migrate the Spark Connect errors into error classes.
[ https://issues.apache.org/jira/browse/SPARK-41712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677150#comment-17677150 ] Haejoon Lee commented on SPARK-41712: - I'm working on this > Migrate the Spark Connect errors into error classes. > > > Key: SPARK-41712 > URL: https://issues.apache.org/jira/browse/SPARK-41712 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We need to migrate the Spark Connect errors into centralized error framework > by leveraging the error class logic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42082) Migrate ValueError into PySparkValueError and manage the functions.py
Haejoon Lee created SPARK-42082: --- Summary: Migrate ValueError into PySparkValueError and manage the functions.py Key: SPARK-42082 URL: https://issues.apache.org/jira/browse/SPARK-42082 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Haejoon Lee We should migrate all Python built-in Exception into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42077: - Assignee: Ruifeng Zheng > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42077. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39588 [https://github.com/apache/spark/pull/39588] > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42080) Add guideline for PySpark errors.
[ https://issues.apache.org/jira/browse/SPARK-42080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677148#comment-17677148 ] Haejoon Lee commented on SPARK-42080: - I'm working on this, we should introduce internal error classes first. > Add guideline for PySpark errors. > - > > Key: SPARK-42080 > URL: https://issues.apache.org/jira/browse/SPARK-42080 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Add guideline for PySpark errores -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42081) improve the plan change validation
Wenchen Fan created SPARK-42081: --- Summary: improve the plan change validation Key: SPARK-42081 URL: https://issues.apache.org/jira/browse/SPARK-42081 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42076: - Assignee: Ruifeng Zheng > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42076. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39587 [https://github.com/apache/spark/pull/39587] > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42080) Add guideline for PySpark errors.
Haejoon Lee created SPARK-42080: --- Summary: Add guideline for PySpark errors. Key: SPARK-42080 URL: https://issues.apache.org/jira/browse/SPARK-42080 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Haejoon Lee Add guideline for PySpark errores -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677144#comment-17677144 ] Apache Spark commented on SPARK-42078: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39591 > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677143#comment-17677143 ] Apache Spark commented on SPARK-42078: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39591 > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42078: Assignee: Apache Spark > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42078: Assignee: (was: Apache Spark) > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677142#comment-17677142 ] Ruifeng Zheng commented on SPARK-42032: --- Have an offline discussion with [~beliefer], Spark Connect has the same behavior as Scala Dataset API, which is different from PySpark in some cases: Dataset API {code:java} scala> spark.createDataFrame(Seq((1, Map("foo" -> -2.0, "bar" -> 2.0.show(100, 100) +---+-+ | _1| _2| +---+-+ | 1|{foo -> -2.0, bar -> 2.0}| +---+-+ {code} PySpark: {code:java} In [2]: spark.createDataFrame([(1, {"foo": -2.0, "bar": 2.0})]).show(100, 100) +---+-+ | _1| _2| +---+-+ | 1|{bar -> 2.0, foo -> -2.0}| +---+-+ {code} this should not be a bug in Connect. > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677141#comment-17677141 ] jiaan.geng commented on SPARK-41988: After my investigation, the fact is the result of connect is the same as Dataset API. This is a bug of pyspark. > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677140#comment-17677140 ] Apache Spark commented on SPARK-42079: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39590 > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42079: Assignee: Apache Spark > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42079: Assignee: (was: Apache Spark) > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42079) Rename proto messages for `toDF` and `WithColumnsRenamed`
Ruifeng Zheng created SPARK-42079: - Summary: Rename proto messages for `toDF` and `WithColumnsRenamed` Key: SPARK-42079 URL: https://issues.apache.org/jira/browse/SPARK-42079 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-42079: -- Summary: Rename proto messages for `toDF` and `withColumnsRenamed` (was: Rename proto messages for `toDF` and `WithColumnsRenamed`) > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42078: Summary: Migrate errors thrown by JVM into PySpark Exception. (was: Migrate errors thrown by JVM into single file.) > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42078) Migrate errors thrown by JVM into PySpark Exception.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677139#comment-17677139 ] Haejoon Lee commented on SPARK-42078: - I'm working on it > Migrate errors thrown by JVM into PySpark Exception. > > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42078) Migrate errors thrown by JVM into single file.
[ https://issues.apache.org/jira/browse/SPARK-42078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42078: Summary: Migrate errors thrown by JVM into single file. (was: Migrate errors thrown by JVM into PySparkException.) > Migrate errors thrown by JVM into single file. > -- > > Key: SPARK-42078 > URL: https://issues.apache.org/jira/browse/SPARK-42078 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42078) Migrate errors thrown by JVM into PySparkException.
Haejoon Lee created SPARK-42078: --- Summary: Migrate errors thrown by JVM into PySparkException. Key: SPARK-42078 URL: https://issues.apache.org/jira/browse/SPARK-42078 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Haejoon Lee We should migrate all exceptions generated on PySpark into PySparkException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677131#comment-17677131 ] Apache Spark commented on SPARK-42077: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39588 > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42077: Assignee: Apache Spark > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677129#comment-17677129 ] Apache Spark commented on SPARK-42077: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39588 > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42077) Literal should throw TypeError for unsupported DataType
[ https://issues.apache.org/jira/browse/SPARK-42077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42077: Assignee: (was: Apache Spark) > Literal should throw TypeError for unsupported DataType > --- > > Key: SPARK-42077 > URL: https://issues.apache.org/jira/browse/SPARK-42077 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42077) Literal should throw TypeError for unsupported DataType
Ruifeng Zheng created SPARK-42077: - Summary: Literal should throw TypeError for unsupported DataType Key: SPARK-42077 URL: https://issues.apache.org/jira/browse/SPARK-42077 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677127#comment-17677127 ] Apache Spark commented on SPARK-42076: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39587 > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677126#comment-17677126 ] Apache Spark commented on SPARK-42076: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39587 > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42076: Assignee: (was: Apache Spark) > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-42076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42076: Assignee: Apache Spark > Factor data conversion `arrow -> rows` out to `conversion.py` > - > > Key: SPARK-42076 > URL: https://issues.apache.org/jira/browse/SPARK-42076 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42076) Factor data conversion `arrow -> rows` out to `conversion.py`
Ruifeng Zheng created SPARK-42076: - Summary: Factor data conversion `arrow -> rows` out to `conversion.py` Key: SPARK-42076 URL: https://issues.apache.org/jira/browse/SPARK-42076 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42075) Deprecate DStream API
[ https://issues.apache.org/jira/browse/SPARK-42075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677114#comment-17677114 ] Chaoqin Li commented on SPARK-42075: [~kabhwan] , sure, I can take this. Thanks! > Deprecate DStream API > - > > Key: SPARK-42075 > URL: https://issues.apache.org/jira/browse/SPARK-42075 > Project: Spark > Issue Type: Task > Components: DStreams >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Blocker > > We've got consensus to deprecate DStream API in 3.4.0. > [https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x] > This Jira ticket is to track the effort of action items: > * Add "deprecation" annotation to the user-facing public API in streaming > directory (DStream) > * Write a release note to explicitly mention the deprecation. (Maybe promote > again that they are encouraged to move to SS.) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677106#comment-17677106 ] jiaan.geng commented on SPARK-42032: This issue duplicated with https://issues.apache.org/jira/browse/SPARK-41988 I'm doing now. > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41586: Assignee: Haejoon Lee > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41586. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39387 [https://github.com/apache/spark/pull/39387] > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41903) Support data type ndarray
[ https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41903: Assignee: Ruifeng Zheng (was: Sandeep Singh) > Support data type ndarray > - > > Key: SPARK-41903 > URL: https://issues.apache.org/jira/browse/SPARK-41903 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > import numpy as np > arr_dtype_to_spark_dtypes = [ > ("int8", [("b", "array")]), > ("int16", [("b", "array")]), > ("int32", [("b", "array")]), > ("int64", [("b", "array")]), > ("float32", [("b", "array")]), > ("float64", [("b", "array")]), > ] > for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes: > arr = np.array([1, 2]).astype(t) > self.assertEqual( > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > ) > arr = np.array([1, 2]).astype(np.uint) > with self.assertRaisesRegex( > TypeError, "The type of array scalar '%s' is not supported" % arr.dtype > ): > self.spark.range(1).select(lit(arr).alias("b")){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 1100, in test_ndarray_input > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line > 332, in wrapped > return getattr(functions, f.__name__)(*args, **kwargs) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 198, in lit > return Column(LiteralExpression._from_value(col)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 266, in _from_value > return LiteralExpression(value=value, > dataType=LiteralExpression._infer_type(value)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 262, in _infer_type > raise ValueError(f"Unsupported Data Type {type(value).__name__}") > ValueError: Unsupported Data Type ndarray {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41903) Support data type ndarray
[ https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41903. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39570 [https://github.com/apache/spark/pull/39570] > Support data type ndarray > - > > Key: SPARK-41903 > URL: https://issues.apache.org/jira/browse/SPARK-41903 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > import numpy as np > arr_dtype_to_spark_dtypes = [ > ("int8", [("b", "array")]), > ("int16", [("b", "array")]), > ("int32", [("b", "array")]), > ("int64", [("b", "array")]), > ("float32", [("b", "array")]), > ("float64", [("b", "array")]), > ] > for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes: > arr = np.array([1, 2]).astype(t) > self.assertEqual( > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > ) > arr = np.array([1, 2]).astype(np.uint) > with self.assertRaisesRegex( > TypeError, "The type of array scalar '%s' is not supported" % arr.dtype > ): > self.spark.range(1).select(lit(arr).alias("b")){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 1100, in test_ndarray_input > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line > 332, in wrapped > return getattr(functions, f.__name__)(*args, **kwargs) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 198, in lit > return Column(LiteralExpression._from_value(col)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 266, in _from_value > return LiteralExpression(value=value, > dataType=LiteralExpression._infer_type(value)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 262, in _infer_type > raise ValueError(f"Unsupported Data Type {type(value).__name__}") > ValueError: Unsupported Data Type ndarray {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41903) Support data type ndarray
[ https://issues.apache.org/jira/browse/SPARK-41903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41903: Assignee: Sandeep Singh > Support data type ndarray > - > > Key: SPARK-41903 > URL: https://issues.apache.org/jira/browse/SPARK-41903 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > > {code:java} > import numpy as np > arr_dtype_to_spark_dtypes = [ > ("int8", [("b", "array")]), > ("int16", [("b", "array")]), > ("int32", [("b", "array")]), > ("int64", [("b", "array")]), > ("float32", [("b", "array")]), > ("float64", [("b", "array")]), > ] > for t, expected_spark_dtypes in arr_dtype_to_spark_dtypes: > arr = np.array([1, 2]).astype(t) > self.assertEqual( > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > ) > arr = np.array([1, 2]).astype(np.uint) > with self.assertRaisesRegex( > TypeError, "The type of array scalar '%s' is not supported" % arr.dtype > ): > self.spark.range(1).select(lit(arr).alias("b")){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 1100, in test_ndarray_input > expected_spark_dtypes, > self.spark.range(1).select(lit(arr).alias("b")).dtypes > File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line > 332, in wrapped > return getattr(functions, f.__name__)(*args, **kwargs) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 198, in lit > return Column(LiteralExpression._from_value(col)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 266, in _from_value > return LiteralExpression(value=value, > dataType=LiteralExpression._infer_type(value)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/expressions.py", > line 262, in _infer_type > raise ValueError(f"Unsupported Data Type {type(value).__name__}") > ValueError: Unsupported Data Type ndarray {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42073) Enable pyspark.sql.tests.test_types 2 test cases
[ https://issues.apache.org/jira/browse/SPARK-42073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42073. -- Resolution: Fixed Issue resolved by pull request 39583 [https://github.com/apache/spark/pull/39583] > Enable pyspark.sql.tests.test_types 2 test cases > > > Key: SPARK-42073 > URL: https://issues.apache.org/jira/browse/SPARK-42073 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
[ https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42074. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39584 [https://github.com/apache/spark/pull/39584] > Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration > -- > > Key: SPARK-42074 > URL: https://issues.apache.org/jira/browse/SPARK-42074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
[ https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42074: - Assignee: Dongjoon Hyun > Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration > -- > > Key: SPARK-42074 > URL: https://issues.apache.org/jira/browse/SPARK-42074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)
[ https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677094#comment-17677094 ] Sandeep Singh commented on SPARK-42002: --- I'm working on this > Implement DataFrameWriterV2 (ReadwriterV2Tests) > --- > > Key: SPARK-42002 > URL: https://issues.apache.org/jira/browse/SPARK-42002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api) > self = > testMethod=test_api> > def test_api(self): > df = self.df > > writer = df.writeTo("testcat.t") > ../test_readwriter.py:185: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = > {} > def writeTo(self, *args: Any, **kwargs: Any) -> None: > > raise NotImplementedError("writeTo() is not implemented.") > E NotImplementedError: writeTo() is not implemented. > ../../connect/dataframe.py:1529: NotImplementedError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42075) Deprecate DStream API
[ https://issues.apache.org/jira/browse/SPARK-42075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677093#comment-17677093 ] Jungtaek Lim commented on SPARK-42075: -- [~Chaoqin] Could you please take this over? Thanks in advance! > Deprecate DStream API > - > > Key: SPARK-42075 > URL: https://issues.apache.org/jira/browse/SPARK-42075 > Project: Spark > Issue Type: Task > Components: DStreams >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Blocker > > We've got consensus to deprecate DStream API in 3.4.0. > [https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x] > This Jira ticket is to track the effort of action items: > * Add "deprecation" annotation to the user-facing public API in streaming > directory (DStream) > * Write a release note to explicitly mention the deprecation. (Maybe promote > again that they are encouraged to move to SS.) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42075) Deprecate DStream API
Jungtaek Lim created SPARK-42075: Summary: Deprecate DStream API Key: SPARK-42075 URL: https://issues.apache.org/jira/browse/SPARK-42075 Project: Spark Issue Type: Task Components: DStreams Affects Versions: 3.4.0 Reporter: Jungtaek Lim We've got consensus to deprecate DStream API in 3.4.0. [https://lists.apache.org/thread/342qnjxmnoydzxr0k8yq4roxdmqjhw9x] This Jira ticket is to track the effort of action items: * Add "deprecation" annotation to the user-facing public API in streaming directory (DStream) * Write a release note to explicitly mention the deprecation. (Maybe promote again that they are encouraged to move to SS.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42074) Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration
[ https://issues.apache.org/jira/browse/SPARK-42074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677089#comment-17677089 ] Apache Spark commented on SPARK-42074: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39584 > Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration > -- > > Key: SPARK-42074 > URL: https://issues.apache.org/jira/browse/SPARK-42074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org