[jira] [Commented] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py
[ https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425947#comment-17425947 ] dgd_contributor commented on SPARK-36952: - working on this > Inline type hints for python/pyspark/resource/information.py and > python/pyspark/resource/profile.py > --- > > Key: SPARK-36952 > URL: https://issues.apache.org/jira/browse/SPARK-36952 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/resource/information.py and > python/pyspark/resource/profile.py -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py
[ https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36952: Comment: was deleted (was: working on this) > Inline type hints for python/pyspark/resource/information.py and > python/pyspark/resource/profile.py > --- > > Key: SPARK-36952 > URL: https://issues.apache.org/jira/browse/SPARK-36952 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/resource/information.py and > python/pyspark/resource/profile.py -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py
dgd_contributor created SPARK-36952: --- Summary: Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py Key: SPARK-36952 URL: https://issues.apache.org/jira/browse/SPARK-36952 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36946) Support time for ps.to_datetime
dgd_contributor created SPARK-36946: --- Summary: Support time for ps.to_datetime Key: SPARK-36946 URL: https://issues.apache.org/jira/browse/SPARK-36946 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36902) Migrate CreateTableAsSelectStatement to v2 command
dgd_contributor created SPARK-36902: --- Summary: Migrate CreateTableAsSelectStatement to v2 command Key: SPARK-36902 URL: https://issues.apache.org/jira/browse/SPARK-36902 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: dgd_contributor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py
[ https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36886: Description: Inline type hints for python/pyspark/sql/context.py from Inline type hints for python/pyspark/sql/context.pyi. (was: Inline type hints for python/pyspark/sql/column.py from Inline type hints for python/pyspark/sql/column.pyi.) > Inline type hints for python/pyspark/sql/context.py > --- > > Key: SPARK-36886 > URL: https://issues.apache.org/jira/browse/SPARK-36886 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/sql/context.py from Inline type hints > for python/pyspark/sql/context.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py
[ https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36886: Summary: Inline type hints for python/pyspark/sql/context.py (was: Inline type hints for python/pyspark/sql/column.py) > Inline type hints for python/pyspark/sql/context.py > --- > > Key: SPARK-36886 > URL: https://issues.apache.org/jira/browse/SPARK-36886 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/sql/column.py from Inline type hints for > python/pyspark/sql/column.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36887) Inline type hints for python/pyspark/sql/conf.py
[ https://issues.apache.org/jira/browse/SPARK-36887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421880#comment-17421880 ] dgd_contributor commented on SPARK-36887: - I'm working on this. > Inline type hints for python/pyspark/sql/conf.py > > > Key: SPARK-36887 > URL: https://issues.apache.org/jira/browse/SPARK-36887 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/sql/session.py from Inline type hints > for python/pyspark/sql/conf.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36887) Inline type hints for python/pyspark/sql/conf.py
dgd_contributor created SPARK-36887: --- Summary: Inline type hints for python/pyspark/sql/conf.py Key: SPARK-36887 URL: https://issues.apache.org/jira/browse/SPARK-36887 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor Inline type hints for python/pyspark/sql/session.py from Inline type hints for python/pyspark/sql/conf.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36886) Inline type hints for python/pyspark/sql/column.py
[ https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421878#comment-17421878 ] dgd_contributor commented on SPARK-36886: - Working on this. > Inline type hints for python/pyspark/sql/column.py > -- > > Key: SPARK-36886 > URL: https://issues.apache.org/jira/browse/SPARK-36886 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > Inline type hints for python/pyspark/sql/column.py from Inline type hints for > python/pyspark/sql/column.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36886) Inline type hints for python/pyspark/sql/column.py
dgd_contributor created SPARK-36886: --- Summary: Inline type hints for python/pyspark/sql/column.py Key: SPARK-36886 URL: https://issues.apache.org/jira/browse/SPARK-36886 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor Inline type hints for python/pyspark/sql/column.py from Inline type hints for python/pyspark/sql/column.pyi. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36845) Inline type hint files
[ https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421859#comment-17421859 ] dgd_contributor commented on SPARK-36845: - Can i work on this :D ? > Inline type hint files > -- > > Key: SPARK-36845 > URL: https://issues.apache.org/jira/browse/SPARK-36845 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SQL >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > Currently there are type hint stub files ({{*.pyi}}) to show the expected > types for functions, but we can also take advantage of static type checking > within the functions by inlining the type hints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36785) Fix ps.DataFrame.isin
[ https://issues.apache.org/jira/browse/SPARK-36785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416283#comment-17416283 ] dgd_contributor commented on SPARK-36785: - I am working on this. > Fix ps.DataFrame.isin > - > > Key: SPARK-36785 > URL: https://issues.apache.org/jira/browse/SPARK-36785 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> psdf = ps.DataFrame( > ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, > 1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]}, > ... ) > >>> > >>> psdf > ab c > > 0 NaN NaN 1 > 1 2.0 5.0 5 > 2 3.0 NaN 1 > 3 4.0 3.0 3 > 4 5.0 2.0 2 > 5 6.0 1.0 1 > 6 7.0 NaN 1 > 7 8.0 0.0 0 > 8 NaN 0.0 0 > >>> other = [1, 2, None] > >>> psdf.isin(other) > a b c > 0 None None True > 1 True None None > 2 None None True > 3 None None None > 4 None True True > 5 None True True > 6 None None True > 7 None None None > 8 None None None > >>> psdf.isin(other).dtypes > abool > bbool > cbool > dtype: object > >>> psdf.to_pandas().isin(other).dtypes > abool > bbool > cbool > dtype: object > >>> psdf.to_pandas().isin(other) >a b c > 0 False False True > 1 True False False > 2 False False True > 3 False False False > 4 False True True > 5 False True True > 6 False False True > 7 False False False > 8 False False False > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36785) Fix ps.DataFrame.isin
dgd_contributor created SPARK-36785: --- Summary: Fix ps.DataFrame.isin Key: SPARK-36785 URL: https://issues.apache.org/jira/browse/SPARK-36785 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor {code:python} >>> psdf = ps.DataFrame( ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]}, ... ) >>> >>> psdf ab c 0 NaN NaN 1 1 2.0 5.0 5 2 3.0 NaN 1 3 4.0 3.0 3 4 5.0 2.0 2 5 6.0 1.0 1 6 7.0 NaN 1 7 8.0 0.0 0 8 NaN 0.0 0 >>> other = [1, 2, None] >>> psdf.isin(other) a b c 0 None None True 1 True None None 2 None None True 3 None None None 4 None True True 5 None True True 6 None None True 7 None None None 8 None None None >>> psdf.isin(other).dtypes abool bbool cbool dtype: object >>> psdf.to_pandas().isin(other).dtypes abool bbool cbool dtype: object >>> psdf.to_pandas().isin(other) a b c 0 False False True 1 True False False 2 False False True 3 False False False 4 False True True 5 False True True 6 False False True 7 False False False 8 False False False >>> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Affects Version/s: (was: 3.3) 3.3.0 > Fix Series.isin when Series has NaN values > -- > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0, 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) > >>> psser = ps.from_pandas(pser) > >>> pser.isin([1, 3, 5, None]) > 0False > 1 True > 2False > 3 True > 4False > 5 True > 6False > 7False > 8False > dtype: bool > >>> psser.isin([1, 3, 5, None]) > 0None > > 1True > 2None > 3True > 4None > 5True > 6None > 7None > 8None > dtype: object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Affects Version/s: 3.2.0 > Fix Series.isin when Series has NaN values > -- > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0, 3.3 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) > >>> psser = ps.from_pandas(pser) > >>> pser.isin([1, 3, 5, None]) > 0False > 1 True > 2False > 3 True > 4False > 5 True > 6False > 7False > 8False > dtype: bool > >>> psser.isin([1, 3, 5, None]) > 0None > > 1True > 2None > 3True > 4None > 5True > 6None > 7None > 8None > dtype: object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36779) Error when list of data type tuples has len = 1
[ https://issues.apache.org/jira/browse/SPARK-36779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36779: Summary: Error when list of data type tuples has len = 1 (was: Fix when list of data type tuples has len = 1) > Error when list of data type tuples has len = 1 > --- > > Key: SPARK-36779 > URL: https://issues.apache.org/jira/browse/SPARK-36779 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> ps.DataFrame[("a", int), [int]] > typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, int] > >>> ps.DataFrame[("a", int), [("b", int)]] > Traceback (most recent call last): > File "", line 1, in > File "/Users/dgd/spark/python/pyspark/pandas/frame.py", line 11998, in > __class_getitem__ > return create_tuple_for_frame_type(params) > File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line > 685, in create_tuple_for_frame_type > return Tuple[extract_types(params)] > File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line > 755, in extract_types > return (index_type,) + extract_types(data_types) > File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line > 770, in extract_types > raise TypeError( > TypeError: Type hints should be specified as one of: > - DataFrame[type, type, ...] > - DataFrame[name: type, name: type, ...] > - DataFrame[dtypes instance] > - DataFrame[zip(names, types)] > - DataFrame[index_type, [type, ...]] > - DataFrame[(index_name, index_type), [(name, type), ...]] > - DataFrame[dtype instance, dtypes instance] > - DataFrame[(index_name, index_type), zip(names, types)] > However, got [('b', )]. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36779) Fix when list of data type tuples has len = 1
dgd_contributor created SPARK-36779: --- Summary: Fix when list of data type tuples has len = 1 Key: SPARK-36779 URL: https://issues.apache.org/jira/browse/SPARK-36779 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor {code:python} >>> ps.DataFrame[("a", int), [int]] typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, int] >>> ps.DataFrame[("a", int), [("b", int)]] Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/frame.py", line 11998, in __class_getitem__ return create_tuple_for_frame_type(params) File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 685, in create_tuple_for_frame_type return Tuple[extract_types(params)] File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 755, in extract_types return (index_type,) + extract_types(data_types) File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 770, in extract_types raise TypeError( TypeError: Type hints should be specified as one of: - DataFrame[type, type, ...] - DataFrame[name: type, name: type, ...] - DataFrame[dtypes instance] - DataFrame[zip(names, types)] - DataFrame[index_type, [type, ...]] - DataFrame[(index_name, index_type), [(name, type), ...]] - DataFrame[dtype instance, dtypes instance] - DataFrame[(index_name, index_type), zip(names, types)] However, got [('b', )]. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Summary: Fix Series.isin when Series has NaN values (was: Fix Series.isin when it have NaN) > Fix Series.isin when Series has NaN values > -- > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) > >>> psser = ps.from_pandas(pser) > >>> pser.isin([1, 3, 5, None]) > 0False > 1 True > 2False > 3 True > 4False > 5 True > 6False > 7False > 8False > dtype: bool > >>> psser.isin([1, 3, 5, None]) > 0None > > 1True > 2None > 3True > 4None > 5True > 6None > 7None > 8None > dtype: object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when it have NaN
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Summary: Fix Series.isin when it have NaN (was: Fix Series.isin when values have NaN) > Fix Series.isin when it have NaN > > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) > >>> psser = ps.from_pandas(pser) > >>> pser.isin([1, 3, 5, None]) > 0False > 1 True > 2False > 3 True > 4False > 5 True > 6False > 7False > 8False > dtype: bool > >>> psser.isin([1, 3, 5, None]) > 0None > > 1True > 2None > 3True > 4None > 5True > 6None > 7None > 8None > dtype: object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when values have NaN
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Description: {code:python} >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) >>> psser = ps.from_pandas(pser) >>> pser.isin([1, 3, 5, None]) 0False 1 True 2False 3 True 4False 5 True 6False 7False 8False dtype: bool >>> psser.isin([1, 3, 5, None]) 0None 1True 2None 3True 4None 5True 6None 7None 8None dtype: object {code} > Fix Series.isin when values have NaN > > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0]) > >>> psser = ps.from_pandas(pser) > >>> pser.isin([1, 3, 5, None]) > 0False > 1 True > 2False > 3 True > 4False > 5 True > 6False > 7False > 8False > dtype: bool > >>> psser.isin([1, 3, 5, None]) > 0None > > 1True > 2None > 3True > 4None > 5True > 6None > 7None > 8None > dtype: object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Fix Series.isin when values have NaN
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Summary: Fix Series.isin when values have NaN (was: Support list-like types for pandas/base.isin) > Fix Series.isin when values have NaN > > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3 >Reporter: dgd_contributor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36762) Support list-like types for pandas/base.isin
[ https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36762: Summary: Support list-like types for pandas/base.isin (was: Support list-like types for base.isin) > Support list-like types for pandas/base.isin > > > Key: SPARK-36762 > URL: https://issues.apache.org/jira/browse/SPARK-36762 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3 >Reporter: dgd_contributor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36762) Support list-like types for base.isin
dgd_contributor created SPARK-36762: --- Summary: Support list-like types for base.isin Key: SPARK-36762 URL: https://issues.apache.org/jira/browse/SPARK-36762 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3 Reporter: dgd_contributor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36742) Fix ps.to_datetime with plurals of keys like years, months, days
[ https://issues.apache.org/jira/browse/SPARK-36742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414194#comment-17414194 ] dgd_contributor commented on SPARK-36742: - working on this > Fix ps.to_datetime with plurals of keys like years, months, days > > > Key: SPARK-36742 > URL: https://issues.apache.org/jira/browse/SPARK-36742 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36742) Fix ps.to_datetime with plurals of keys like years, months, days
dgd_contributor created SPARK-36742: --- Summary: Fix ps.to_datetime with plurals of keys like years, months, days Key: SPARK-36742 URL: https://issues.apache.org/jira/browse/SPARK-36742 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas
[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414179#comment-17414179 ] dgd_contributor commented on SPARK-36728: - thanks, pandas API on spark does not work with plurals of the key, I can make a PR to fix it. About "In [https://github.com/pandas-dev/pandas/blob/73c68257545b5f8530b7044f56647bd2db92e2ba/pandas/core/tools/datetimes.py#L922]There is hardcoded what yours columns can be named. It's very bad practice." Can you give your opinion ? [~hyukjin.kwon] [~ueshin] > Can't create datetime object from anything other then year column Pyspark - > koalas > -- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: pyspark_date.txt, pyspark_date2.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- -- > -- - 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- -- > -- - 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) > KeyError: "['testyear'] not in index" > df_test > testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 > 30 25 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas
[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413725#comment-17413725 ] dgd_contributor commented on SPARK-36728: - I think this is not a bug, same behavior in pandas. We need set name of columns like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same. [docs|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#] {code:java} >>> df_test = pd.DataFrame( ... { ... 'testyear': [2015, 2016], ... 'testmonth': [2, 3], ... 'testday': [4, 5], ... } ... ) >>> pd.to_datetime(df_test[['testyear', 'testmonth', 'testday']]) Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/venv/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 890, in to_datetime result = _assemble_from_unit_mappings(arg, errors, tz) File "/Users/dgd/spark/python/venv/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 996, in _assemble_from_unit_mappings raise ValueError( ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing >>> {code} > Can't create datetime object from anything other then year column Pyspark - > koalas > -- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: pyspark_date.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- -- > -- - 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- -- > -- - 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) >
[jira] [Updated] (SPARK-36722) Problems with update function in koalas - pyspark pandas.
[ https://issues.apache.org/jira/browse/SPARK-36722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36722: Affects Version/s: 3.2.0 > Problems with update function in koalas - pyspark pandas. > - > > Key: SPARK-36722 > URL: https://issues.apache.org/jira/browse/SPARK-36722 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Hi I am using "from pyspark import pandas as ps" in a master build yesterday. > I do have some columns that I need to join to one. > In pandas I use update. > 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION > 23 non-null object > 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P > 24348 non-null object > > > > pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P']) > > --- > AssertionError Traceback (most recent call last) > /tmp/ipykernel_73/391781247.py in > > 1 > pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P']) > /opt/spark/python/pyspark/pandas/series.py in update(self, other) > 4549 raise TypeError("'other' must be a Series") > 4550 > -> 4551 combined = combine_frames(self._psdf, other._psdf, how="leftouter") > 4552 > 4553 this_scol = > combined["this"]._internal.spark_column_for(self._column_label) > /opt/spark/python/pyspark/pandas/utils.py in combine_frames(this, how, > preserve_order_column, *args) > 139 elif len(args) == 1 and isinstance(args[0], DataFrame): > 140 assert isinstance(args[0], DataFrame) > --> 141 assert not same_anchor( > 142 this, args[0] > 143 ), "We don't need to combine. `this` and `that` are same." > AssertionError: We don't need to combine. `this` and `that` are same. > pd1.info() > 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION > 23 non-null object > 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P > 24348 non-null object -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36722) Problems with update function in koalas - pyspark pandas.
[ https://issues.apache.org/jira/browse/SPARK-36722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413632#comment-17413632 ] dgd_contributor commented on SPARK-36722: - I will make a PR to fix this one soon > Problems with update function in koalas - pyspark pandas. > - > > Key: SPARK-36722 > URL: https://issues.apache.org/jira/browse/SPARK-36722 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Hi I am using "from pyspark import pandas as ps" in a master build yesterday. > I do have some columns that I need to join to one. > In pandas I use update. > 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION > 23 non-null object > 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P > 24348 non-null object > > > > pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P']) > > --- > AssertionError Traceback (most recent call last) > /tmp/ipykernel_73/391781247.py in > > 1 > pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P']) > /opt/spark/python/pyspark/pandas/series.py in update(self, other) > 4549 raise TypeError("'other' must be a Series") > 4550 > -> 4551 combined = combine_frames(self._psdf, other._psdf, how="leftouter") > 4552 > 4553 this_scol = > combined["this"]._internal.spark_column_for(self._column_label) > /opt/spark/python/pyspark/pandas/utils.py in combine_frames(this, how, > preserve_order_column, *args) > 139 elif len(args) == 1 and isinstance(args[0], DataFrame): > 140 assert isinstance(args[0], DataFrame) > --> 141 assert not same_anchor( > 142 this, args[0] > 143 ), "We don't need to combine. `this` and `that` are same." > AssertionError: We don't need to combine. `this` and `that` are same. > pd1.info() > 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION > 23 non-null object > 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P > 24348 non-null object -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35823) Spark UI-Stages DAG visualization is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35823: Affects Version/s: (was: 3.1.1) 3.3.0 3.2.0 3.0.3 3.1.2 > Spark UI-Stages DAG visualization is empty in IE11 > -- > > Key: SPARK-35823 > URL: https://issues.apache.org/jira/browse/SPARK-35823 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: jobit mathew >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35821: Affects Version/s: (was: 3.1.1) 3.3.0 3.2.0 3.0.3 3.1.2 > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: Umbrella > Components: Web UI >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok. > Spark job history shows completed and incomplete applications list .But when > we go inside each application same issue may be there. > Attaching some screenshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35822: Affects Version/s: 3.3.0 > Spark UI-Executor tab is empty in IE11 > -- > > Key: SPARK-35822 > URL: https://issues.apache.org/jira/browse/SPARK-35822 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG > > > In yarn mode issue there -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36685) Fix wrong assert statement
[ https://issues.apache.org/jira/browse/SPARK-36685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36685: Affects Version/s: (was: 3.1.2) 3.3.0 3.1.3 3.2.0 2.4.8 3.0.3 > Fix wrong assert statement > -- > > Key: SPARK-36685 > URL: https://issues.apache.org/jira/browse/SPARK-36685 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.8, 3.0.3, 3.2.0, 3.1.3, 3.3.0 >Reporter: Sanket Reddy >Priority: Trivial > > {code:scala} > require(numCols == mat.numCols, "The number of rows of the matrices in this > sequence, " + "don't match!") > {code} > Shall the error message be "The number of columns..."? > This issue also appears in the open source spark: > > [https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala#L1266] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35822: Affects Version/s: (was: 3.1.1) 3.2.0 3.0.3 3.1.2 > Spark UI-Executor tab is empty in IE11 > -- > > Key: SPARK-35822 > URL: https://issues.apache.org/jira/browse/SPARK-35822 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG > > > In yarn mode issue there -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35821: Issue Type: Umbrella (was: New Feature) > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: Umbrella > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok. > Spark job history shows completed and incomplete applications list .But when > we go inside each application same issue may be there. > Attaching some screenshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-35822: Attachment: Executortab_IE.PNG Executortab_Chrome.png > Spark UI-Executor tab is empty in IE11 > -- > > Key: SPARK-35822 > URL: https://issues.apache.org/jira/browse/SPARK-35822 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG > > > In yarn mode issue there -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35823) Spark UI-Stages DAG visualization is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412080#comment-17412080 ] dgd_contributor commented on SPARK-35823: - I'm working on this. > Spark UI-Stages DAG visualization is empty in IE11 > -- > > Key: SPARK-35823 > URL: https://issues.apache.org/jira/browse/SPARK-35823 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35822) Spark UI-Executor tab is empty in IE11
[ https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412079#comment-17412079 ] dgd_contributor commented on SPARK-35822: - I'm working on this. > Spark UI-Executor tab is empty in IE11 > -- > > Key: SPARK-35822 > URL: https://issues.apache.org/jira/browse/SPARK-35822 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > > In yarn mode issue there -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36671) Support Series.__and__ for Integral
[ https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36671: Summary: Support Series.__and__ for Integral (was: Support __and__ in num_ops.py) > Support Series.__and__ for Integral > --- > > Key: SPARK-36671 > URL: https://issues.apache.org/jira/browse/SPARK-36671 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > 0False > 1False > 2 True > 3False > dtype: bool > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6]) > >>> pser1 & pser2 > 00 > 10 > 22 > dtype: int64 > >>> pser1 = ps.Series([1, 2, 3]) > >>> pser2 = ps.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > Traceback (most recent call last): > File "", line 1, in > File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ > return self._dtype_op.__and__(self, other) > File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line > 317, in __and__ > raise TypeError("Bitwise and can not be applied to %s." % > self.pretty_name) > TypeError: Bitwise and can not be applied to integrals. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36671) Support __and__ in num_ops.py
[ https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411009#comment-17411009 ] dgd_contributor commented on SPARK-36671: - working on this > Support __and__ in num_ops.py > - > > Key: SPARK-36671 > URL: https://issues.apache.org/jira/browse/SPARK-36671 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > 0False > 1False > 2 True > 3False > dtype: bool > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6]) > >>> pser1 & pser2 > 00 > 10 > 22 > dtype: int64 > >>> pser1 = ps.Series([1, 2, 3]) > >>> pser2 = ps.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > Traceback (most recent call last): > File "", line 1, in > File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ > return self._dtype_op.__and__(self, other) > File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line > 317, in __and__ > raise TypeError("Bitwise and can not be applied to %s." % > self.pretty_name) > TypeError: Bitwise and can not be applied to integrals. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36671) Support __and__ in num_ops.py
[ https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36671: Description: {code:python} >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6, 7]) >>> pser1 & pser2 0False 1False 2 True 3False dtype: bool >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6]) >>> pser1 & pser2 00 10 22 dtype: int64 >>> pser1 = ps.Series([1, 2, 3]) >>> pser2 = ps.Series([4, 5, 6, 7]) >>> pser1 & pser2 Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ return self._dtype_op.__and__(self, other) File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 317, in __and__ raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name) TypeError: Bitwise and can not be applied to integrals. {code} was: {code:python} >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6, 7]) >>> pser1 & pser2 0False 1False 2 True 3False dtype: bool >>> pser1 = ps.Series([1, 2, 3]) >>> pser2 = ps.Series([4, 5, 6, 7]) >>> pser1 & pser2 Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ return self._dtype_op.__and__(self, other) File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 317, in __and__ raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name) TypeError: Bitwise and can not be applied to integrals. {code} > Support __and__ in num_ops.py > - > > Key: SPARK-36671 > URL: https://issues.apache.org/jira/browse/SPARK-36671 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > 0False > 1False > 2 True > 3False > dtype: bool > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6]) > >>> pser1 & pser2 > 00 > 10 > 22 > dtype: int64 > >>> pser1 = ps.Series([1, 2, 3]) > >>> pser2 = ps.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > Traceback (most recent call last): > File "", line 1, in > File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ > return self._dtype_op.__and__(self, other) > File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line > 317, in __and__ > raise TypeError("Bitwise and can not be applied to %s." % > self.pretty_name) > TypeError: Bitwise and can not be applied to integrals. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36671) Support __and__ in num_ops.py
[ https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36671: Description: {code:python} >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6, 7]) >>> pser1 & pser2 0False 1False 2 True 3False dtype: bool >>> pser1 = ps.Series([1, 2, 3]) >>> pser2 = ps.Series([4, 5, 6, 7]) >>> pser1 & pser2 Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ return self._dtype_op.__and__(self, other) File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 317, in __and__ raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name) TypeError: Bitwise and can not be applied to integrals. {code} was: {code:python} >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6, 7]) >>> pser1 ^ pser2 0 True 1 True 2 True 3False dtype: bool >>> pser1 = ps.Series([1, 2, 3]) >>> pser2 = ps.Series([4, 5, 6, 7]) >>> pser1 & pser2 Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ return self._dtype_op.__and__(self, other) File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 317, in __and__ raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name) TypeError: Bitwise and can not be applied to integrals. {code} > Support __and__ in num_ops.py > - > > Key: SPARK-36671 > URL: https://issues.apache.org/jira/browse/SPARK-36671 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Priority: Major > > {code:python} > >>> pser1 = pd.Series([1, 2, 3]) > >>> pser2 = pd.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > 0False > 1False > 2 True > 3False > dtype: bool > >>> pser1 = ps.Series([1, 2, 3]) > >>> pser2 = ps.Series([4, 5, 6, 7]) > >>> pser1 & pser2 > Traceback (most recent call last): > File "", line 1, in > File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ > return self._dtype_op.__and__(self, other) > File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line > 317, in __and__ > raise TypeError("Bitwise and can not be applied to %s." % > self.pretty_name) > TypeError: Bitwise and can not be applied to integrals. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36671) Support __and__ in num_ops.py
dgd_contributor created SPARK-36671: --- Summary: Support __and__ in num_ops.py Key: SPARK-36671 URL: https://issues.apache.org/jira/browse/SPARK-36671 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dgd_contributor {code:python} >>> pser1 = pd.Series([1, 2, 3]) >>> pser2 = pd.Series([4, 5, 6, 7]) >>> pser1 ^ pser2 0 True 1 True 2 True 3False dtype: bool >>> pser1 = ps.Series([1, 2, 3]) >>> pser2 = ps.Series([4, 5, 6, 7]) >>> pser1 & pser2 Traceback (most recent call last): File "", line 1, in File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__ return self._dtype_op.__and__(self, other) File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 317, in __and__ raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name) TypeError: Bitwise and can not be applied to integrals. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36402) Implement Series.combine
[ https://issues.apache.org/jira/browse/SPARK-36402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405172#comment-17405172 ] dgd_contributor commented on SPARK-36402: - working on this > Implement Series.combine > > > Key: SPARK-36402 > URL: https://issues.apache.org/jira/browse/SPARK-36402 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36515) Improve test coverage for groupby.py and window.py.
[ https://issues.apache.org/jira/browse/SPARK-36515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405058#comment-17405058 ] dgd_contributor commented on SPARK-36515: - I'm working on this. > Improve test coverage for groupby.py and window.py. > --- > > Key: SPARK-36515 > URL: https://issues.apache.org/jira/browse/SPARK-36515 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > There are many codes are not being tested in groupby.py and window.py which > is main implementation codes for GroupBy and Rolling/Expanding. > We should improve the test coverage as much as possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36434) Implement DataFrame.lookup
[ https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151 ] dgd_contributor edited comment on SPARK-36434 at 8/20/21, 10:44 AM: should we work on this? this docs show that dataframe.lookup is deprecated [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html] was (Author: dc-heros): should we work on this? this docs show dataframe.lookup is deprecated [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html] > Implement DataFrame.lookup > -- > > Key: SPARK-36434 > URL: https://issues.apache.org/jira/browse/SPARK-36434 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36434) Implement DataFrame.lookup
[ https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151 ] dgd_contributor commented on SPARK-36434: - should we work on this? this docs show dataframe.lookup is deprecated [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html] > Implement DataFrame.lookup > -- > > Key: SPARK-36434 > URL: https://issues.apache.org/jira/browse/SPARK-36434 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36387) Fix Series.astype from datetime to nullable string
[ https://issues.apache.org/jira/browse/SPARK-36387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397953#comment-17397953 ] dgd_contributor commented on SPARK-36387: - Sry about my late, please go ahead! this's my first time in pyspark. > Fix Series.astype from datetime to nullable string > -- > > Key: SPARK-36387 > URL: https://issues.apache.org/jira/browse/SPARK-36387 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > Attachments: image-2021-08-12-14-24-31-321.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36387) Fix Series.astype from datetime to nullable string
[ https://issues.apache.org/jira/browse/SPARK-36387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392648#comment-17392648 ] dgd_contributor commented on SPARK-36387: - Can I work on this ? > Fix Series.astype from datetime to nullable string > -- > > Key: SPARK-36387 > URL: https://issues.apache.org/jira/browse/SPARK-36387 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36303) Refactor fourteenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391294#comment-17391294 ] dgd_contributor commented on SPARK-36303: - working on this > Refactor fourteenth set of 20 query execution errors to use error classes > - > > Key: SPARK-36303 > URL: https://issues.apache.org/jira/browse/SPARK-36303 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the fourteenth set of 20. > {code:java} > cannotGetEventTimeWatermarkError > cannotSetTimeoutTimestampError > batchMetadataFileNotFoundError > multiStreamingQueriesUsingPathConcurrentlyError > addFilesWithAbsolutePathUnsupportedError > microBatchUnsupportedByDataSourceError > cannotExecuteStreamingRelationExecError > invalidStreamingOutputModeError > catalogPluginClassNotFoundError > catalogPluginClassNotImplementedError > catalogPluginClassNotFoundForCatalogError > catalogFailToFindPublicNoArgConstructorError > catalogFailToCallPublicNoArgConstructorError > cannotInstantiateAbstractCatalogPluginClassError > failedToInstantiateConstructorForCatalogError > noSuchElementExceptionError > noSuchElementExceptionError > cannotMutateReadOnlySQLConfError > cannotCloneOrCopyReadOnlySQLConfError > cannotGetSQLConfInSchedulerEventLoopThreadError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36302) Refactor thirteenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391283#comment-17391283 ] dgd_contributor commented on SPARK-36302: - working on this. > Refactor thirteenth set of 20 query execution errors to use error classes > - > > Key: SPARK-36302 > URL: https://issues.apache.org/jira/browse/SPARK-36302 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the thirteenth set of 20. > {code:java} > serDeInterfaceNotFoundError > convertHiveTableToCatalogTableError > cannotRecognizeHiveTypeError > getTablesByTypeUnsupportedByHiveVersionError > dropTableWithPurgeUnsupportedError > alterTableWithDropPartitionAndPurgeUnsupportedError > invalidPartitionFilterError > getPartitionMetadataByFilterError > unsupportedHiveMetastoreVersionError > loadHiveClientCausesNoClassDefFoundError > cannotFetchTablesOfDatabaseError > illegalLocationClauseForViewPartitionError > renamePathAsExistsPathError > renameAsExistsPathError > renameSrcPathNotFoundError > failedRenameTempFileError > legacyMetadataPathExistsError > partitionColumnNotFoundInSchemaError > stateNotDefinedOrAlreadyRemovedError > cannotSetTimeoutDurationError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36301) Refactor twelfth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391279#comment-17391279 ] dgd_contributor commented on SPARK-36301: - working on this > Refactor twelfth set of 20 query execution errors to use error classes > -- > > Key: SPARK-36301 > URL: https://issues.apache.org/jira/browse/SPARK-36301 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the twelfth set of 20. > {code:java} > cannotRewriteDomainJoinWithConditionsError > decorrelateInnerQueryThroughPlanUnsupportedError > methodCalledInAnalyzerNotAllowedError > cannotSafelyMergeSerdePropertiesError > pairUnsupportedAtFunctionError > onceStrategyIdempotenceIsBrokenForBatchError[TreeType > structuralIntegrityOfInputPlanIsBrokenInClassError > structuralIntegrityIsBrokenAfterApplyingRuleError > ruleIdNotFoundForRuleError > cannotCreateArrayWithElementsExceedLimitError > indexOutOfBoundsOfArrayDataError > malformedRecordsDetectedInRecordParsingError > remoteOperationsUnsupportedError > invalidKerberosConfigForHiveServer2Error > parentSparkUIToAttachTabNotFoundError > inferSchemaUnsupportedForHiveError > requestedPartitionsMismatchTablePartitionsError > dynamicPartitionKeyNotAmongWrittenPartitionPathsError > cannotRemovePartitionDirError > cannotCreateStagingDirError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36298) Refactor ninth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391277#comment-17391277 ] dgd_contributor commented on SPARK-36298: - working on this. > Refactor ninth set of 20 query execution errors to use error classes > > > Key: SPARK-36298 > URL: https://issues.apache.org/jira/browse/SPARK-36298 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the ninth set of 20. > {code:java} > unscaledValueTooLargeForPrecisionError > decimalPrecisionExceedsMaxPrecisionError > outOfDecimalTypeRangeError > unsupportedArrayTypeError > unsupportedJavaTypeError > failedParsingStructTypeError > failedMergingFieldsError > cannotMergeDecimalTypesWithIncompatiblePrecisionAndScaleError > cannotMergeDecimalTypesWithIncompatiblePrecisionError > cannotMergeDecimalTypesWithIncompatibleScaleError > cannotMergeIncompatibleDataTypesError > exceedMapSizeLimitError > duplicateMapKeyFoundError > mapDataKeyArrayLengthDiffersFromValueArrayLengthError > fieldDiffersFromDerivedLocalDateError > failToParseDateTimeInNewParserError > failToFormatDateTimeInNewFormatterError > failToRecognizePatternAfterUpgradeError > failToRecognizePatternError > cannotCastUTF8StringToDataTypeError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36299) Refactor tenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391278#comment-17391278 ] dgd_contributor commented on SPARK-36299: - working on this. > Refactor tenth set of 20 query execution errors to use error classes > > > Key: SPARK-36299 > URL: https://issues.apache.org/jira/browse/SPARK-36299 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the tenth set of 20. > {code:java} > registeringStreamingQueryListenerError > concurrentQueryInstanceError > cannotParseJsonArraysAsStructsError > cannotParseStringAsDataTypeError > failToParseEmptyStringForDataTypeError > failToParseValueForDataTypeError > rootConverterReturnNullError > cannotHaveCircularReferencesInBeanClassError > cannotHaveCircularReferencesInClassError > cannotUseInvalidJavaIdentifierAsFieldNameError > cannotFindEncoderForTypeError > attributesForTypeUnsupportedError > schemaForTypeUnsupportedError > cannotFindConstructorForTypeError > paramExceedOneCharError > paramIsNotIntegerError > paramIsNotBooleanValueError > foundNullValueForNotNullableFieldError > malformedCSVRecordError > elementsOfTupleExceedLimitError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions
[ https://issues.apache.org/jira/browse/SPARK-36343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390283#comment-17390283 ] dgd_contributor commented on SPARK-36343: - [~hyukjin.kwon] In my case, i'm using spark thrift server version 2.4.4 in production. which integrate with apache ranger through [https://github.com/yaooqinn/spark-ranger] by conf: {code:java} spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension{code} and now we want integrate spark sql with atlas through SAC [https://github.com/hortonworks-spark/spark-atlas-connector], which *does not support spark 3.x* by conf {code:java} spark.sql.extensions com.hortonworks.spark.atlas.sql.SparkExtension{code} > spark 2.4.x spark.sql.extensions support multiple extensions > > > Key: SPARK-36343 > URL: https://issues.apache.org/jira/browse/SPARK-36343 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.8 >Reporter: dgd_contributor >Priority: Major > > Like SPARK-26493 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions
[ https://issues.apache.org/jira/browse/SPARK-36343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36343: Description: Like SPARK-26493 (was: Like issue [SPARK-26493]) > spark 2.4.x spark.sql.extensions support multiple extensions > > > Key: SPARK-36343 > URL: https://issues.apache.org/jira/browse/SPARK-36343 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.8 >Reporter: dgd_contributor >Priority: Major > > Like SPARK-26493 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions
dgd_contributor created SPARK-36343: --- Summary: spark 2.4.x spark.sql.extensions support multiple extensions Key: SPARK-36343 URL: https://issues.apache.org/jira/browse/SPARK-36343 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.8 Reporter: dgd_contributor Like issue [SPARK-26493] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388387#comment-17388387 ] dgd_contributor commented on SPARK-36099: - Sorry I wasn't checking the comment recently, I've done the work for the spark core but didn't create a pull request because I've been waiting for the approve in SPARK-36095. Again, truly sorry for your wasted time. [~Shockang] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387830#comment-17387830 ] dgd_contributor commented on SPARK-36099: - [~Shockang] how is your progress, I already have the work done on my local repo and will make a pull request soon > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36291) Refactor second set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387720#comment-17387720 ] dgd_contributor commented on SPARK-36291: - working on this > Refactor second set of 20 query execution errors to use error classes > - > > Key: SPARK-36291 > URL: https://issues.apache.org/jira/browse/SPARK-36291 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the second set of 20. > {code:java} > inputTypeUnsupportedError > invalidFractionOfSecondError > overflowInSumOfDecimalError > overflowInIntegralDivideError > mapSizeExceedArraySizeWhenZipMapError > copyNullFieldNotAllowedError > literalTypeUnsupportedError > noDefaultForDataTypeError > doGenCodeOfAliasShouldNotBeCalledError > orderedOperationUnsupportedByDataTypeError > regexGroupIndexLessThanZeroError > regexGroupIndexExceedGroupCountError > invalidUrlError > dataTypeOperationUnsupportedError > mergeUnsupportedByWindowFunctionError > dataTypeUnexpectedError > typeUnsupportedError > negativeValueUnexpectedError > addNewFunctionMismatchedWithFunctionError > cannotGenerateCodeForUncomparableTypeError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters
[ https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dgd_contributor updated SPARK-36229: Description: 1/ SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new inconsistency in behaviour where the returned value is different above the 64 char threshold. {noformat} scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show +---+ |conv(repeat(?, 64), 10, 16)| +---+ | 0| +---+ scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show +---+ |conv(repeat(?, 65), 10, 16)| +---+ | | +---+ scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show ++ |conv(repeat(?, 65), 10, -16)| ++ | -1| ++ scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show ++ |conv(repeat(?, 64), 10, -16)| ++ | 0| ++{noformat} 2/ conv should return result equal to max unsigned long value in base toBase when there is overflow {code:java} scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show // which should be 18446744073709551615 +---+ |conv(aaa0aaa0a, 16, 10)| +---+ | 12297828695278266890| +---+ {code} was: SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new inconsistency in behaviour where the returned value is different above the 64 char threshold. {noformat} scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show +---+ |conv(repeat(?, 64), 10, 16)| +---+ | 0| +---+ scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show +---+ |conv(repeat(?, 65), 10, 16)| +---+ | | +---+ scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show ++ |conv(repeat(?, 65), 10, -16)| ++ | -1| ++ scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show ++ |conv(repeat(?, 64), 10, -16)| ++ | 0| ++{noformat} > conv() inconsistently handles invalid strings with > 64 invalid characters > -- > > Key: SPARK-36229 > URL: https://issues.apache.org/jira/browse/SPARK-36229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tim Armstrong >Priority: Major > > 1/ SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new > inconsistency in behaviour where the returned value is different above the 64 > char threshold. > > {noformat} > scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show > +---+ > |conv(repeat(?, 64), 10, 16)| > +---+ > | 0| > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show > +---+ > |conv(repeat(?, 65), 10, 16)| > +---+ > | | > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show > ++ > |conv(repeat(?, 65), 10, -16)| > ++ > | -1| > ++ > scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show > ++ > |conv(repeat(?, 64), 10, -16)| > ++ > | 0| > ++{noformat} > > 2/ conv should return result equal to max unsigned long value in base toBase > when there is overflow > {code:java} > scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show > // which should be 18446744073709551615 > +---+ > |conv(aaa0aaa0a, 16, 10)| > +---+ > | 12297828695278266890| > +---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters
[ https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384666#comment-17384666 ] dgd_contributor edited comment on SPARK-36229 at 7/21/21, 5:44 AM: --- After look closely, I found out that the overflow check in encode is wrong and need to work on too. For example: {code:java} scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show +---+ |conv(aaa0aaa0a, 16, 10)| +---+ | 12297828695278266890| +---+{code} which should be 18446744073709551615 I will raise a pull request soon was (Author: dc-heros): After look closely, I found out that the overflow check in encode is wrong and need to work on too. For example: {code:java} scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show +---+ |conv(aaa0aaa0a, 16, 10)| +---+ | 12297828695278266890| +---+ which should be 18446744073709551615{code} I will raise a pull request soon > conv() inconsistently handles invalid strings with > 64 invalid characters > -- > > Key: SPARK-36229 > URL: https://issues.apache.org/jira/browse/SPARK-36229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tim Armstrong >Priority: Major > > SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new > inconsistency in behaviour where the returned value is different above the 64 > char threshold. > > {noformat} > scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show > +---+ > |conv(repeat(?, 64), 10, 16)| > +---+ > | 0| > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show > +---+ > |conv(repeat(?, 65), 10, 16)| > +---+ > | | > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show > ++ > |conv(repeat(?, 65), 10, -16)| > ++ > | -1| > ++ > scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show > ++ > |conv(repeat(?, 64), 10, -16)| > ++ > | 0| > ++{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters
[ https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384666#comment-17384666 ] dgd_contributor commented on SPARK-36229: - After look closely, I found out that the overflow check in encode is wrong and need to work on too. For example: {code:java} scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show +---+ |conv(aaa0aaa0a, 16, 10)| +---+ | 12297828695278266890| +---+ which should be 18446744073709551615{code} I will raise a pull request soon > conv() inconsistently handles invalid strings with > 64 invalid characters > -- > > Key: SPARK-36229 > URL: https://issues.apache.org/jira/browse/SPARK-36229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tim Armstrong >Priority: Major > > SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new > inconsistency in behaviour where the returned value is different above the 64 > char threshold. > > {noformat} > scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show > +---+ > |conv(repeat(?, 64), 10, 16)| > +---+ > | 0| > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show > +---+ > |conv(repeat(?, 65), 10, 16)| > +---+ > | | > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show > ++ > |conv(repeat(?, 65), 10, -16)| > ++ > | -1| > ++ > scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show > ++ > |conv(repeat(?, 64), 10, -16)| > ++ > | 0| > ++{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters
[ https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384610#comment-17384610 ] dgd_contributor commented on SPARK-36229: - thanks, I will look into this > conv() inconsistently handles invalid strings with > 64 invalid characters > -- > > Key: SPARK-36229 > URL: https://issues.apache.org/jira/browse/SPARK-36229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tim Armstrong >Priority: Major > > SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new > inconsistency in behaviour where the returned value is different above the 64 > char threshold. > > {noformat} > scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show > +---+ > |conv(repeat(?, 64), 10, 16)| > +---+ > | 0| > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show > +---+ > |conv(repeat(?, 65), 10, 16)| > +---+ > | | > +---+ > scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show > ++ > |conv(repeat(?, 65), 10, -16)| > ++ > | -1| > ++ > scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show > ++ > |conv(repeat(?, 64), 10, -16)| > ++ > | 0| > ++{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'
[ https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380396#comment-17380396 ] dgd_contributor commented on SPARK-34532: - there is addExact func in IntervalUtils to handle overflow > IntervalUtils.add() may result in 'long overflow' > - > > Key: SPARK-34532 > URL: https://issues.apache.org/jira/browse/SPARK-34532 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ted Yu >Priority: Major > > I noticed the following when running test suite: > build/sbt "sql/testOnly *SQLQueryTestSuite" > {code} > 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage > 6416.0 failed 1 times; aborting job > [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds) > 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 6476.0 (TID 7789) > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > {code} > {code} > 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 > in stage 14744.0 (TID 16705) > java.lang.ArithmeticException: long overflow > at java.lang.Math.addExact(Math.java:809) > at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105) > at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573) > at > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97) > {code} > This likely was caused by the following line: > {code} > val microseconds = left.microseconds + right.microseconds > {code} > We should check whether the addition would produce overflow before adding. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36095) Group exception messages in core/rdd
[ https://issues.apache.org/jira/browse/SPARK-36095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379511#comment-17379511 ] dgd_contributor commented on SPARK-36095: - [~allisonwang-db] I would like to work on this, can you specify a general rule for spark core? would we create new package errors in org.apache.spark > Group exception messages in core/rdd > - > > Key: SPARK-36095 > URL: https://issues.apache.org/jira/browse/SPARK-36095 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/rdd' > || Filename|| Count || > | BlockRDD.scala | 2 | > | DoubleRDDFunctions.scala| 1 | > | EmptyRDD.scala | 1 | > | HadoopRDD.scala | 1 | > | LocalCheckpointRDD.scala| 1 | > | NewHadoopRDD.scala | 1 | > | PairRDDFunctions.scala | 7 | > | PipedRDD.scala | 1 | > | RDD.scala | 8 | > | ReliableCheckpointRDD.scala | 4 | > | ReliableRDDCheckpointData.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36022) Respect interval fields in extract
[ https://issues.apache.org/jira/browse/SPARK-36022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376334#comment-17376334 ] dgd_contributor commented on SPARK-36022: - as they are calendarInterval, which is presented by months, days and microseconds, I think "2021 years" is equal to "2021 years and 0 months", why the last command should fail? > Respect interval fields in extract > -- > > Key: SPARK-36022 > URL: https://issues.apache.org/jira/browse/SPARK-36022 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Extract should process only existing fields of interval types. For example: > {code:sql} > spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); > 11 > spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021' YEAR); > 0 > {code} > The last command should fail as the month field doesn't present in INTERVAL > YEAR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372566#comment-17372566 ] dgd_contributor commented on SPARK-35955: - I will raise a pull request soon > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35841) Casting string to decimal type doesn't work if the sum of the digits is greater than 38
[ https://issues.apache.org/jira/browse/SPARK-35841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366924#comment-17366924 ] dgd_contributor commented on SPARK-35841: - I would like to work on this > Casting string to decimal type doesn't work if the sum of the digits is > greater than 38 > --- > > Key: SPARK-35841 > URL: https://issues.apache.org/jira/browse/SPARK-35841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and > Spark 3.1.2 images > (Hadoop 3.2.1, Python 3.9, Scala 2.12.13) >Reporter: Roberto Gelsi >Priority: Major > > Since Spark 3.1.1, NULL is returned when casting a string with many decimal > places to a decimal type. If the sum of the digits before and after the > decimal point is less than 39, a value is returned. From 39 digits, however, > NULL is returned. > This worked until Spark 3.0.X. > Code to reproduce: > * A string with 2 decimal places in front of the decimal point and 37 decimal > places after the decimal point returns null > {code:python} > data = ['28.92599983799625624669715762138'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > +-+ > |value| > +-+ > |null | > +-+ > > * A string with 2 decimal places in front of the decimal point and 36 decimal > places after the decimal point returns the number as decimal > {code:python} > data = ['28.9259998379962562466971576213'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > ++ > |value | > ++ > |28.92600| > ++ > * A string with 1 decimal place in front of the decimal point and 37 decimal > places after the decimal point returns the number as decimal > {code:python} > data = ['2.92599983799625624669715762138'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > +---+ > |value | > +---+ > |2.92600| > +---+ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364629#comment-17364629 ] dgd_contributor commented on SPARK-33603: - [~beliefer] can I work on this, if you don't mind? > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35064) Group exception messages in spark/sql (catalyst)
[ https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363375#comment-17363375 ] dgd_contributor commented on SPARK-35064: - I would like to work on this > Group exception messages in spark/sql (catalyst) > > > Key: SPARK-35064 > URL: https://issues.apache.org/jira/browse/SPARK-35064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35622) DataFrame's count function do not need groupBy and avoid shuffle
[ https://issues.apache.org/jira/browse/SPARK-35622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363336#comment-17363336 ] dgd_contributor commented on SPARK-35622: - Run a benchmark on my computer, df.rdd.count() execution time is 1/4 as df.count(). I create a df with 10 rows and run 1000 loops and df.rdd.count() execution time is 2671551 nanoSecond, df.count() execution time is 116798269100 nanoSecond > DataFrame's count function do not need groupBy and avoid shuffle > > > Key: SPARK-35622 > URL: https://issues.apache.org/jira/browse/SPARK-35622 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: xiepengjie >Priority: Major > > Use `df.rdd.count()` replace `df.count()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362824#comment-17362824 ] dgd_contributor commented on SPARK-35563: - After looking to this, I found out rowNumber in RowNumberLike is interger_type, but I don't think is a bug? Should I create a pull request? > [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows > -- > > Key: SPARK-35563 > URL: https://issues.apache.org/jira/browse/SPARK-35563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2 >Reporter: Robert Joseph Evans >Priority: Blocker > Labels: data-loss > > I think this impacts a lot more versions of Spark, but I don't know for sure > because it takes a long time to test. As a part of doing corner case > validation testing for spark rapids I found that if a window function has > more than {{Int.MaxValue + 1}} rows the result is silently truncated to that > many rows. I have only tested this on 3.0.2 with {{row_number}}, but I > suspect it will impact others as well. This is a really rare corner case, but > because it is silent data corruption I personally think it is quite serious. > {code:scala} > import org.apache.spark.sql.expressions.Window > val windowSpec = Window.partitionBy("a").orderBy("b") > val df = spark.range(Int.MaxValue.toLong + 100).selectExpr(s"1 as a", "id as > b") > spark.time(df.select(col("a"), col("b"), > row_number().over(windowSpec).alias("rn")).orderBy(desc("a"), > desc("b")).select((col("rn") < 0).alias("dir")).groupBy("dir").count.show(20)) > +-+--+ > > | dir| count| > +-+--+ > |false|2147483647| > | true| 1| > +-+--+ > Time taken: 1139089 ms > Int.MaxValue.toLong + 100 > res15: Long = 2147483747 > 2147483647L + 1 > res16: Long = 2147483648 > {code} > I had to make sure that I ran the above with at least 64GiB of heap for the > executor (I did it in local mode and it worked, but took forever to run) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32891) Enhance UTF8String.trim
[ https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362766#comment-17362766 ] dgd_contributor commented on SPARK-32891: - After looking into this and run a few benchmark, I don't think there is a big different at the execution time between [UTF8String].trim().toString() and [UTF8String].toString().trim(), as for 100 loop on my computer, [UTF8String].trim().toString() execution time is 13168899600 nanoSecond and [UTF8String].toString().trim() is 11813350700 nanoSecond > Enhance UTF8String.trim > --- > > Key: SPARK-32891 > URL: https://issues.apache.org/jira/browse/SPARK-32891 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > It sounds like {{UTF8String.trim}} is not implemented well. We may need to > look at how {{java.lang.String.trim}} is implemented. > Please see comment: > > [https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675] > [https://github.com/apache/spark/pull/29731#discussion_r487709672] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35622) DataFrame's count function do not need groupBy and avoid shuffle
[ https://issues.apache.org/jira/browse/SPARK-35622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362681#comment-17362681 ] dgd_contributor commented on SPARK-35622: - hi, could you explain more of this issue? where should we replace df.count() to df.rdd.count()? > DataFrame's count function do not need groupBy and avoid shuffle > > > Key: SPARK-35622 > URL: https://issues.apache.org/jira/browse/SPARK-35622 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: xiepengjie >Priority: Major > > Use `df.rdd.count()` replace `df.count()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org