[jira] [Commented] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425947#comment-17425947
 ] 

dgd_contributor commented on SPARK-36952:
-

working on this

> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py
> ---
>
> Key: SPARK-36952
> URL: https://issues.apache.org/jira/browse/SPARK-36952
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36952:

Comment: was deleted

(was: working on this)

> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py
> ---
>
> Key: SPARK-36952
> URL: https://issues.apache.org/jira/browse/SPARK-36952
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36952:
---

 Summary: Inline type hints for 
python/pyspark/resource/information.py and python/pyspark/resource/profile.py
 Key: SPARK-36952
 URL: https://issues.apache.org/jira/browse/SPARK-36952
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


Inline type hints for python/pyspark/resource/information.py and 
python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36946) Support time for ps.to_datetime

2021-10-07 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36946:
---

 Summary: Support time for ps.to_datetime
 Key: SPARK-36946
 URL: https://issues.apache.org/jira/browse/SPARK-36946
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36902) Migrate CreateTableAsSelectStatement to v2 command

2021-09-30 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36902:
---

 Summary: Migrate CreateTableAsSelectStatement to v2 command
 Key: SPARK-36902
 URL: https://issues.apache.org/jira/browse/SPARK-36902
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: dgd_contributor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-09-28 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36886:

Description: Inline type hints for python/pyspark/sql/context.py from 
Inline type hints for python/pyspark/sql/context.pyi.  (was: Inline type hints 
for python/pyspark/sql/column.py from Inline type hints for 
python/pyspark/sql/column.pyi.)

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/context.py from Inline type hints 
> for python/pyspark/sql/context.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-09-28 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36886:

Summary: Inline type hints for python/pyspark/sql/context.py  (was: Inline 
type hints for python/pyspark/sql/column.py)

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/column.py from Inline type hints for 
> python/pyspark/sql/column.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36887) Inline type hints for python/pyspark/sql/conf.py

2021-09-28 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421880#comment-17421880
 ] 

dgd_contributor commented on SPARK-36887:
-

I'm working on this.

> Inline type hints for python/pyspark/sql/conf.py
> 
>
> Key: SPARK-36887
> URL: https://issues.apache.org/jira/browse/SPARK-36887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/session.py from Inline type hints 
> for python/pyspark/sql/conf.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36887) Inline type hints for python/pyspark/sql/conf.py

2021-09-28 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36887:
---

 Summary: Inline type hints for python/pyspark/sql/conf.py
 Key: SPARK-36887
 URL: https://issues.apache.org/jira/browse/SPARK-36887
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


Inline type hints for python/pyspark/sql/session.py from Inline type hints for 
python/pyspark/sql/conf.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36886) Inline type hints for python/pyspark/sql/column.py

2021-09-28 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421878#comment-17421878
 ] 

dgd_contributor commented on SPARK-36886:
-

Working on this.

> Inline type hints for python/pyspark/sql/column.py
> --
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/column.py from Inline type hints for 
> python/pyspark/sql/column.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36886) Inline type hints for python/pyspark/sql/column.py

2021-09-28 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36886:
---

 Summary: Inline type hints for python/pyspark/sql/column.py
 Key: SPARK-36886
 URL: https://issues.apache.org/jira/browse/SPARK-36886
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


Inline type hints for python/pyspark/sql/column.py from Inline type hints for 
python/pyspark/sql/column.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-28 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421859#comment-17421859
 ] 

dgd_contributor commented on SPARK-36845:
-

Can i work on this :D ?

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36785) Fix ps.DataFrame.isin

2021-09-16 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416283#comment-17416283
 ] 

dgd_contributor commented on SPARK-36785:
-

I am working on this.

> Fix ps.DataFrame.isin
> -
>
> Key: SPARK-36785
> URL: https://issues.apache.org/jira/browse/SPARK-36785
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> psdf = ps.DataFrame(
> ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 
> 1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
> ... )
> >>> 
> >>> psdf
>  ab  c
>   
> 0  NaN  NaN  1
> 1  2.0  5.0  5
> 2  3.0  NaN  1
> 3  4.0  3.0  3
> 4  5.0  2.0  2
> 5  6.0  1.0  1
> 6  7.0  NaN  1
> 7  8.0  0.0  0
> 8  NaN  0.0  0
> >>> other = [1, 2, None]
> >>> psdf.isin(other)
>   a b c
> 0  None  None  True
> 1  True  None  None
> 2  None  None  True
> 3  None  None  None
> 4  None  True  True
> 5  None  True  True
> 6  None  None  True
> 7  None  None  None
> 8  None  None  None
> >>> psdf.isin(other).dtypes
> abool
> bbool
> cbool
> dtype: object
> >>> psdf.to_pandas().isin(other).dtypes
> abool
> bbool
> cbool
> dtype: object
> >>> psdf.to_pandas().isin(other)
>a  b  c
> 0  False  False   True
> 1   True  False  False
> 2  False  False   True
> 3  False  False  False
> 4  False   True   True
> 5  False   True   True
> 6  False  False   True
> 7  False  False  False
> 8  False  False  False
> >>> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36785) Fix ps.DataFrame.isin

2021-09-16 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36785:
---

 Summary: Fix ps.DataFrame.isin
 Key: SPARK-36785
 URL: https://issues.apache.org/jira/browse/SPARK-36785
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


{code:python}
>>> psdf = ps.DataFrame(
... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1, 
None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
... )
>>> 
>>> psdf
 ab  c  
0  NaN  NaN  1
1  2.0  5.0  5
2  3.0  NaN  1
3  4.0  3.0  3
4  5.0  2.0  2
5  6.0  1.0  1
6  7.0  NaN  1
7  8.0  0.0  0
8  NaN  0.0  0
>>> other = [1, 2, None]

>>> psdf.isin(other)
  a b c
0  None  None  True
1  True  None  None
2  None  None  True
3  None  None  None
4  None  True  True
5  None  True  True
6  None  None  True
7  None  None  None
8  None  None  None
>>> psdf.isin(other).dtypes
abool
bbool
cbool
dtype: object
>>> psdf.to_pandas().isin(other).dtypes
abool
bbool
cbool
dtype: object
>>> psdf.to_pandas().isin(other)
   a  b  c
0  False  False   True
1   True  False  False
2  False  False   True
3  False  False  False
4  False   True   True
5  False   True   True
6  False  False   True
7  False  False  False
8  False  False  False
>>> 

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values

2021-09-16 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Affects Version/s: (was: 3.3)
   3.3.0

> Fix Series.isin when Series has NaN values
> --
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
> >>> psser = ps.from_pandas(pser)
> >>> pser.isin([1, 3, 5, None])
> 0False
> 1 True
> 2False
> 3 True
> 4False
> 5 True
> 6False
> 7False
> 8False
> dtype: bool
> >>> psser.isin([1, 3, 5, None])
> 0None 
>   
> 1True
> 2None
> 3True
> 4None
> 5True
> 6None
> 7None
> 8None
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values

2021-09-16 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Affects Version/s: 3.2.0

> Fix Series.isin when Series has NaN values
> --
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0, 3.3
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
> >>> psser = ps.from_pandas(pser)
> >>> pser.isin([1, 3, 5, None])
> 0False
> 1 True
> 2False
> 3 True
> 4False
> 5 True
> 6False
> 7False
> 8False
> dtype: bool
> >>> psser.isin([1, 3, 5, None])
> 0None 
>   
> 1True
> 2None
> 3True
> 4None
> 5True
> 6None
> 7None
> 8None
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36779) Error when list of data type tuples has len = 1

2021-09-16 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36779:

Summary: Error when list of data type tuples has len = 1  (was: Fix when 
list of data type tuples has len = 1)

> Error when list of data type tuples has len = 1
> ---
>
> Key: SPARK-36779
> URL: https://issues.apache.org/jira/browse/SPARK-36779
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> ps.DataFrame[("a", int), [int]]
> typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, int]
> >>> ps.DataFrame[("a", int), [("b", int)]]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/dgd/spark/python/pyspark/pandas/frame.py", line 11998, in 
> __class_getitem__
> return create_tuple_for_frame_type(params)
>   File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 685, in create_tuple_for_frame_type
> return Tuple[extract_types(params)]
>   File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 755, in extract_types
> return (index_type,) + extract_types(data_types)
>   File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 770, in extract_types
> raise TypeError(
> TypeError: Type hints should be specified as one of:
>   - DataFrame[type, type, ...]
>   - DataFrame[name: type, name: type, ...]
>   - DataFrame[dtypes instance]
>   - DataFrame[zip(names, types)]
>   - DataFrame[index_type, [type, ...]]
>   - DataFrame[(index_name, index_type), [(name, type), ...]]
>   - DataFrame[dtype instance, dtypes instance]
>   - DataFrame[(index_name, index_type), zip(names, types)]
> However, got [('b', )].
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36779) Fix when list of data type tuples has len = 1

2021-09-16 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36779:
---

 Summary: Fix when list of data type tuples has len = 1
 Key: SPARK-36779
 URL: https://issues.apache.org/jira/browse/SPARK-36779
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor



{code:python}
>>> ps.DataFrame[("a", int), [int]]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, int]

>>> ps.DataFrame[("a", int), [("b", int)]]
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/frame.py", line 11998, in 
__class_getitem__
return create_tuple_for_frame_type(params)
  File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 685, 
in create_tuple_for_frame_type
return Tuple[extract_types(params)]
  File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 755, 
in extract_types
return (index_type,) + extract_types(data_types)
  File "/Users/dgd/spark/python/pyspark/pandas/typedef/typehints.py", line 770, 
in extract_types
raise TypeError(
TypeError: Type hints should be specified as one of:
  - DataFrame[type, type, ...]
  - DataFrame[name: type, name: type, ...]
  - DataFrame[dtypes instance]
  - DataFrame[zip(names, types)]
  - DataFrame[index_type, [type, ...]]
  - DataFrame[(index_name, index_type), [(name, type), ...]]
  - DataFrame[dtype instance, dtypes instance]
  - DataFrame[(index_name, index_type), zip(names, types)]
However, got [('b', )].

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when Series has NaN values

2021-09-15 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Summary: Fix Series.isin when Series has NaN values  (was: Fix Series.isin 
when it have NaN)

> Fix Series.isin when Series has NaN values
> --
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
> >>> psser = ps.from_pandas(pser)
> >>> pser.isin([1, 3, 5, None])
> 0False
> 1 True
> 2False
> 3 True
> 4False
> 5 True
> 6False
> 7False
> 8False
> dtype: bool
> >>> psser.isin([1, 3, 5, None])
> 0None 
>   
> 1True
> 2None
> 3True
> 4None
> 5True
> 6None
> 7None
> 8None
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when it have NaN

2021-09-15 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Summary: Fix Series.isin when it have NaN  (was: Fix Series.isin when 
values have NaN)

> Fix Series.isin when it have NaN
> 
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
> >>> psser = ps.from_pandas(pser)
> >>> pser.isin([1, 3, 5, None])
> 0False
> 1 True
> 2False
> 3 True
> 4False
> 5 True
> 6False
> 7False
> 8False
> dtype: bool
> >>> psser.isin([1, 3, 5, None])
> 0None 
>   
> 1True
> 2None
> 3True
> 4None
> 5True
> 6None
> 7None
> 8None
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when values have NaN

2021-09-15 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Description: 
{code:python}
>>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
>>> psser = ps.from_pandas(pser)
>>> pser.isin([1, 3, 5, None])
0False
1 True
2False
3 True
4False
5 True
6False
7False
8False
dtype: bool
>>> psser.isin([1, 3, 5, None])
0None   
1True
2None
3True
4None
5True
6None
7None
8None
dtype: object

{code}

> Fix Series.isin when values have NaN
> 
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser = pd.Series([None, 5, None, 3, 2, 1, None, 0, 0])
> >>> psser = ps.from_pandas(pser)
> >>> pser.isin([1, 3, 5, None])
> 0False
> 1 True
> 2False
> 3 True
> 4False
> 5 True
> 6False
> 7False
> 8False
> dtype: bool
> >>> psser.isin([1, 3, 5, None])
> 0None 
>   
> 1True
> 2None
> 3True
> 4None
> 5True
> 6None
> 7None
> 8None
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Fix Series.isin when values have NaN

2021-09-15 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Summary: Fix Series.isin when values have NaN  (was: Support list-like 
types for pandas/base.isin)

> Fix Series.isin when values have NaN
> 
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36762) Support list-like types for pandas/base.isin

2021-09-14 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36762:

Summary: Support list-like types for pandas/base.isin  (was: Support 
list-like types for base.isin)

> Support list-like types for pandas/base.isin
> 
>
> Key: SPARK-36762
> URL: https://issues.apache.org/jira/browse/SPARK-36762
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36762) Support list-like types for base.isin

2021-09-14 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36762:
---

 Summary: Support list-like types for base.isin
 Key: SPARK-36762
 URL: https://issues.apache.org/jira/browse/SPARK-36762
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3
Reporter: dgd_contributor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36742) Fix ps.to_datetime with plurals of keys like years, months, days

2021-09-13 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414194#comment-17414194
 ] 

dgd_contributor commented on SPARK-36742:
-

working on this

 

> Fix ps.to_datetime with plurals of keys like years, months, days
> 
>
> Key: SPARK-36742
> URL: https://issues.apache.org/jira/browse/SPARK-36742
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36742) Fix ps.to_datetime with plurals of keys like years, months, days

2021-09-13 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36742:
---

 Summary: Fix ps.to_datetime with plurals of keys like years, 
months, days
 Key: SPARK-36742
 URL: https://issues.apache.org/jira/browse/SPARK-36742
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

2021-09-13 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414179#comment-17414179
 ] 

dgd_contributor commented on SPARK-36728:
-

thanks, pandas API on spark does not work with plurals of the key, I can make a 
PR to fix it.

About 

"In 
[https://github.com/pandas-dev/pandas/blob/73c68257545b5f8530b7044f56647bd2db92e2ba/pandas/core/tools/datetimes.py#L922]There
 is hardcoded what yours columns can be named. It's very bad practice."

Can you give your opinion ? [~hyukjin.kwon] [~ueshin]

> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> --
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  --  
> --  - 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  --  
> --  -      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 
> 30 25



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

2021-09-12 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413725#comment-17413725
 ] 

dgd_contributor commented on SPARK-36728:
-

I think this is not a bug, same behavior in pandas. We need set name of columns 
like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals 
of the same. 
[docs|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#]


{code:java}
>>> df_test = pd.DataFrame(
... {
... 'testyear': [2015, 2016],
... 'testmonth': [2, 3],
... 'testday': [4, 5],
... }
... ) 
>>> pd.to_datetime(df_test[['testyear', 'testmonth', 'testday']])
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/dgd/spark/python/venv/lib/python3.8/site-packages/pandas/core/tools/datetimes.py",
 line 890, in to_datetime
result = _assemble_from_unit_mappings(arg, errors, tz)
  File 
"/Users/dgd/spark/python/venv/lib/python3.8/site-packages/pandas/core/tools/datetimes.py",
 line 996, in _assemble_from_unit_mappings
raise ValueError(
ValueError: to assemble mappings requires at least that [year, month, day] be 
specified: [day,month,year] is missing
>>>
{code}


> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> --
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: pyspark_date.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  --  
> --  - 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  --  
> --  -      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> 

[jira] [Updated] (SPARK-36722) Problems with update function in koalas - pyspark pandas.

2021-09-11 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36722:

Affects Version/s: 3.2.0

> Problems with update function in koalas - pyspark pandas.
> -
>
> Key: SPARK-36722
> URL: https://issues.apache.org/jira/browse/SPARK-36722
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Hi I am using "from pyspark import pandas as ps" in a master build yesterday. 
> I do have some columns that I need to join to one. 
> In pandas I use update.
> 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION 
> 23 non-null object 
> 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P 
> 24348 non-null object
>  
>  
>  
> pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P'])
>  
>  ---
> AssertionError Traceback (most recent call last)
> /tmp/ipykernel_73/391781247.py in 
> > 1 
> pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P'])
> /opt/spark/python/pyspark/pandas/series.py in update(self, other)
>  4549 raise TypeError("'other' must be a Series")
>  4550 
> -> 4551 combined = combine_frames(self._psdf, other._psdf, how="leftouter")
>  4552 
>  4553 this_scol = 
> combined["this"]._internal.spark_column_for(self._column_label)
> /opt/spark/python/pyspark/pandas/utils.py in combine_frames(this, how, 
> preserve_order_column, *args)
>  139 elif len(args) == 1 and isinstance(args[0], DataFrame):
>  140 assert isinstance(args[0], DataFrame)
> --> 141 assert not same_anchor(
>  142 this, args[0]
>  143 ), "We don't need to combine. `this` and `that` are same."
> AssertionError: We don't need to combine. `this` and `that` are same.
> pd1.info()
> 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION 
> 23 non-null object 
> 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P 
> 24348 non-null object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36722) Problems with update function in koalas - pyspark pandas.

2021-09-11 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413632#comment-17413632
 ] 

dgd_contributor commented on SPARK-36722:
-

I will make a PR to fix this one soon

> Problems with update function in koalas - pyspark pandas.
> -
>
> Key: SPARK-36722
> URL: https://issues.apache.org/jira/browse/SPARK-36722
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Hi I am using "from pyspark import pandas as ps" in a master build yesterday. 
> I do have some columns that I need to join to one. 
> In pandas I use update.
> 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION 
> 23 non-null object 
> 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P 
> 24348 non-null object
>  
>  
>  
> pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P'])
>  
>  ---
> AssertionError Traceback (most recent call last)
> /tmp/ipykernel_73/391781247.py in 
> > 1 
> pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION'].update(pd1['FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P'])
> /opt/spark/python/pyspark/pandas/series.py in update(self, other)
>  4549 raise TypeError("'other' must be a Series")
>  4550 
> -> 4551 combined = combine_frames(self._psdf, other._psdf, how="leftouter")
>  4552 
>  4553 this_scol = 
> combined["this"]._internal.spark_column_for(self._column_label)
> /opt/spark/python/pyspark/pandas/utils.py in combine_frames(this, how, 
> preserve_order_column, *args)
>  139 elif len(args) == 1 and isinstance(args[0], DataFrame):
>  140 assert isinstance(args[0], DataFrame)
> --> 141 assert not same_anchor(
>  142 this, args[0]
>  143 ), "We don't need to combine. `this` and `that` are same."
> AssertionError: We don't need to combine. `this` and `that` are same.
> pd1.info()
> 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION 
> 23 non-null object 
> 55 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION.P 
> 24348 non-null object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35823) Spark UI-Stages DAG visualization is empty in IE11

2021-09-10 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35823:

Affects Version/s: (was: 3.1.1)
   3.3.0
   3.2.0
   3.0.3
   3.1.2

> Spark UI-Stages DAG visualization is empty in IE11
> --
>
> Key: SPARK-35823
> URL: https://issues.apache.org/jira/browse/SPARK-35823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: jobit mathew
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-09-10 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35821:

Affects Version/s: (was: 3.1.1)
   3.3.0
   3.2.0
   3.0.3
   3.1.2

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok.
> Spark job history shows completed and incomplete applications list .But when 
> we go inside each application same issue may be there.
> Attaching some screenshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11

2021-09-10 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35822:

Affects Version/s: 3.3.0

> Spark UI-Executor tab is empty in IE11
> --
>
> Key: SPARK-35822
> URL: https://issues.apache.org/jira/browse/SPARK-35822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG
>
>
> In yarn mode issue there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36685) Fix wrong assert statement

2021-09-10 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36685:

Affects Version/s: (was: 3.1.2)
   3.3.0
   3.1.3
   3.2.0
   2.4.8
   3.0.3

> Fix wrong assert statement
> --
>
> Key: SPARK-36685
> URL: https://issues.apache.org/jira/browse/SPARK-36685
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.8, 3.0.3, 3.2.0, 3.1.3, 3.3.0
>Reporter: Sanket Reddy
>Priority: Trivial
>
> {code:scala}
> require(numCols == mat.numCols, "The number of rows of the matrices in this 
> sequence, " + "don't match!")
> {code}
> Shall the error message be "The number of columns..."?
> This issue also appears in the open source spark:
>  
> [https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala#L1266]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11

2021-09-09 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35822:

Affects Version/s: (was: 3.1.1)
   3.2.0
   3.0.3
   3.1.2

> Spark UI-Executor tab is empty in IE11
> --
>
> Key: SPARK-35822
> URL: https://issues.apache.org/jira/browse/SPARK-35822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG
>
>
> In yarn mode issue there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-09-09 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35821:

Issue Type: Umbrella  (was: New Feature)

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok.
> Spark job history shows completed and incomplete applications list .But when 
> we go inside each application same issue may be there.
> Attaching some screenshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35822) Spark UI-Executor tab is empty in IE11

2021-09-09 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-35822:

Attachment: Executortab_IE.PNG
Executortab_Chrome.png

> Spark UI-Executor tab is empty in IE11
> --
>
> Key: SPARK-35822
> URL: https://issues.apache.org/jira/browse/SPARK-35822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG
>
>
> In yarn mode issue there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35823) Spark UI-Stages DAG visualization is empty in IE11

2021-09-08 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412080#comment-17412080
 ] 

dgd_contributor commented on SPARK-35823:
-

I'm working on this.

> Spark UI-Stages DAG visualization is empty in IE11
> --
>
> Key: SPARK-35823
> URL: https://issues.apache.org/jira/browse/SPARK-35823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35822) Spark UI-Executor tab is empty in IE11

2021-09-08 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412079#comment-17412079
 ] 

dgd_contributor commented on SPARK-35822:
-

I'm working on this.

> Spark UI-Executor tab is empty in IE11
> --
>
> Key: SPARK-35822
> URL: https://issues.apache.org/jira/browse/SPARK-35822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
>
> In yarn mode issue there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36671) Support Series.__and__ for Integral

2021-09-08 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36671:

Summary: Support Series.__and__ for Integral  (was: Support __and__ in 
num_ops.py)

> Support Series.__and__ for Integral
> ---
>
> Key: SPARK-36671
> URL: https://issues.apache.org/jira/browse/SPARK-36671
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> 0False
> 1False
> 2 True
> 3False
> dtype: bool
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6])
> >>> pser1 & pser2
> 00
> 10
> 22
> dtype: int64
> >>> pser1 = ps.Series([1, 2, 3])
> >>> pser2 = ps.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
> return self._dtype_op.__and__(self, other)
>   File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
> 317, in __and__
> raise TypeError("Bitwise and can not be applied to %s." % 
> self.pretty_name)
> TypeError: Bitwise and can not be applied to integrals.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36671) Support __and__ in num_ops.py

2021-09-07 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411009#comment-17411009
 ] 

dgd_contributor commented on SPARK-36671:
-

working on this

> Support __and__ in num_ops.py
> -
>
> Key: SPARK-36671
> URL: https://issues.apache.org/jira/browse/SPARK-36671
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> 0False
> 1False
> 2 True
> 3False
> dtype: bool
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6])
> >>> pser1 & pser2
> 00
> 10
> 22
> dtype: int64
> >>> pser1 = ps.Series([1, 2, 3])
> >>> pser2 = ps.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
> return self._dtype_op.__and__(self, other)
>   File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
> 317, in __and__
> raise TypeError("Bitwise and can not be applied to %s." % 
> self.pretty_name)
> TypeError: Bitwise and can not be applied to integrals.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36671) Support __and__ in num_ops.py

2021-09-05 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36671:

Description: 
{code:python}
>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6, 7])
>>> pser1 & pser2
0False
1False
2 True
3False
dtype: bool

>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6])
>>> pser1 & pser2
00
10
22
dtype: int64

>>> pser1 = ps.Series([1, 2, 3])
>>> pser2 = ps.Series([4, 5, 6, 7])
>>> pser1 & pser2
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
return self._dtype_op.__and__(self, other)
  File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
317, in __and__
raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name)
TypeError: Bitwise and can not be applied to integrals.

{code}
 

  was:
{code:python}
>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6, 7])
>>> pser1 & pser2
0False
1False
2 True
3False
dtype: bool

>>> pser1 = ps.Series([1, 2, 3])
>>> pser2 = ps.Series([4, 5, 6, 7])
>>> pser1 & pser2
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
return self._dtype_op.__and__(self, other)
  File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
317, in __and__
raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name)
TypeError: Bitwise and can not be applied to integrals.

{code}
 


> Support __and__ in num_ops.py
> -
>
> Key: SPARK-36671
> URL: https://issues.apache.org/jira/browse/SPARK-36671
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> 0False
> 1False
> 2 True
> 3False
> dtype: bool
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6])
> >>> pser1 & pser2
> 00
> 10
> 22
> dtype: int64
> >>> pser1 = ps.Series([1, 2, 3])
> >>> pser2 = ps.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
> return self._dtype_op.__and__(self, other)
>   File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
> 317, in __and__
> raise TypeError("Bitwise and can not be applied to %s." % 
> self.pretty_name)
> TypeError: Bitwise and can not be applied to integrals.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36671) Support __and__ in num_ops.py

2021-09-05 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36671:

Description: 
{code:python}
>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6, 7])
>>> pser1 & pser2
0False
1False
2 True
3False
dtype: bool

>>> pser1 = ps.Series([1, 2, 3])
>>> pser2 = ps.Series([4, 5, 6, 7])
>>> pser1 & pser2
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
return self._dtype_op.__and__(self, other)
  File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
317, in __and__
raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name)
TypeError: Bitwise and can not be applied to integrals.

{code}
 

  was:
{code:python}
>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6, 7])
>>> pser1 ^ pser2
0 True
1 True
2 True
3False
dtype: bool

>>> pser1 = ps.Series([1, 2, 3])
>>> pser2 = ps.Series([4, 5, 6, 7])
>>> pser1 & pser2
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
return self._dtype_op.__and__(self, other)
  File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
317, in __and__
raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name)
TypeError: Bitwise and can not be applied to integrals.

{code}
 


> Support __and__ in num_ops.py
> -
>
> Key: SPARK-36671
> URL: https://issues.apache.org/jira/browse/SPARK-36671
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> {code:python}
> >>> pser1 = pd.Series([1, 2, 3])
> >>> pser2 = pd.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> 0False
> 1False
> 2 True
> 3False
> dtype: bool
> >>> pser1 = ps.Series([1, 2, 3])
> >>> pser2 = ps.Series([4, 5, 6, 7])
> >>> pser1 & pser2
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
> return self._dtype_op.__and__(self, other)
>   File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
> 317, in __and__
> raise TypeError("Bitwise and can not be applied to %s." % 
> self.pretty_name)
> TypeError: Bitwise and can not be applied to integrals.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36671) Support __and__ in num_ops.py

2021-09-05 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36671:
---

 Summary: Support __and__ in num_ops.py
 Key: SPARK-36671
 URL: https://issues.apache.org/jira/browse/SPARK-36671
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


{code:python}
>>> pser1 = pd.Series([1, 2, 3])
>>> pser2 = pd.Series([4, 5, 6, 7])
>>> pser1 ^ pser2
0 True
1 True
2 True
3False
dtype: bool

>>> pser1 = ps.Series([1, 2, 3])
>>> pser2 = ps.Series([4, 5, 6, 7])
>>> pser1 & pser2
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/dgd/spark/python/pyspark/pandas/base.py", line 423, in __and__
return self._dtype_op.__and__(self, other)
  File "/Users/dgd/spark/python/pyspark/pandas/data_type_ops/base.py", line 
317, in __and__
raise TypeError("Bitwise and can not be applied to %s." % self.pretty_name)
TypeError: Bitwise and can not be applied to integrals.

{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36402) Implement Series.combine

2021-08-26 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405172#comment-17405172
 ] 

dgd_contributor commented on SPARK-36402:
-

working on this

> Implement Series.combine
> 
>
> Key: SPARK-36402
> URL: https://issues.apache.org/jira/browse/SPARK-36402
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36515) Improve test coverage for groupby.py and window.py.

2021-08-26 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405058#comment-17405058
 ] 

dgd_contributor commented on SPARK-36515:
-

I'm working on this.

> Improve test coverage for groupby.py and window.py.
> ---
>
> Key: SPARK-36515
> URL: https://issues.apache.org/jira/browse/SPARK-36515
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There are many codes are not being tested in groupby.py and window.py which 
> is main implementation codes for GroupBy and Rolling/Expanding.
> We should improve the test coverage as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36434) Implement DataFrame.lookup

2021-08-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151
 ] 

dgd_contributor edited comment on SPARK-36434 at 8/20/21, 10:44 AM:


should we work on this? this docs show that dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 


was (Author: dc-heros):
should we work on this? this docs show dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 

> Implement DataFrame.lookup
> --
>
> Key: SPARK-36434
> URL: https://issues.apache.org/jira/browse/SPARK-36434
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36434) Implement DataFrame.lookup

2021-08-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151
 ] 

dgd_contributor commented on SPARK-36434:
-

should we work on this? this docs show dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 

> Implement DataFrame.lookup
> --
>
> Key: SPARK-36434
> URL: https://issues.apache.org/jira/browse/SPARK-36434
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36387) Fix Series.astype from datetime to nullable string

2021-08-12 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397953#comment-17397953
 ] 

dgd_contributor commented on SPARK-36387:
-

Sry about my late, please go ahead! this's my first time in pyspark.

> Fix Series.astype from datetime to nullable string
> --
>
> Key: SPARK-36387
> URL: https://issues.apache.org/jira/browse/SPARK-36387
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
> Attachments: image-2021-08-12-14-24-31-321.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36387) Fix Series.astype from datetime to nullable string

2021-08-03 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392648#comment-17392648
 ] 

dgd_contributor commented on SPARK-36387:
-

Can I work on this ?

> Fix Series.astype from datetime to nullable string
> --
>
> Key: SPARK-36387
> URL: https://issues.apache.org/jira/browse/SPARK-36387
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36303) Refactor fourteenth set of 20 query execution errors to use error classes

2021-08-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391294#comment-17391294
 ] 

dgd_contributor commented on SPARK-36303:
-

working on this

> Refactor fourteenth set of 20 query execution errors to use error classes
> -
>
> Key: SPARK-36303
> URL: https://issues.apache.org/jira/browse/SPARK-36303
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the fourteenth set of 20.
> {code:java}
> cannotGetEventTimeWatermarkError
> cannotSetTimeoutTimestampError
> batchMetadataFileNotFoundError
> multiStreamingQueriesUsingPathConcurrentlyError
> addFilesWithAbsolutePathUnsupportedError
> microBatchUnsupportedByDataSourceError
> cannotExecuteStreamingRelationExecError
> invalidStreamingOutputModeError
> catalogPluginClassNotFoundError
> catalogPluginClassNotImplementedError
> catalogPluginClassNotFoundForCatalogError
> catalogFailToFindPublicNoArgConstructorError
> catalogFailToCallPublicNoArgConstructorError
> cannotInstantiateAbstractCatalogPluginClassError
> failedToInstantiateConstructorForCatalogError
> noSuchElementExceptionError
> noSuchElementExceptionError
> cannotMutateReadOnlySQLConfError
> cannotCloneOrCopyReadOnlySQLConfError
> cannotGetSQLConfInSchedulerEventLoopThreadError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36302) Refactor thirteenth set of 20 query execution errors to use error classes

2021-08-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391283#comment-17391283
 ] 

dgd_contributor commented on SPARK-36302:
-

working on this.

> Refactor thirteenth set of 20 query execution errors to use error classes
> -
>
> Key: SPARK-36302
> URL: https://issues.apache.org/jira/browse/SPARK-36302
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the thirteenth set of 20.
> {code:java}
> serDeInterfaceNotFoundError
> convertHiveTableToCatalogTableError
> cannotRecognizeHiveTypeError
> getTablesByTypeUnsupportedByHiveVersionError
> dropTableWithPurgeUnsupportedError
> alterTableWithDropPartitionAndPurgeUnsupportedError
> invalidPartitionFilterError
> getPartitionMetadataByFilterError
> unsupportedHiveMetastoreVersionError
> loadHiveClientCausesNoClassDefFoundError
> cannotFetchTablesOfDatabaseError
> illegalLocationClauseForViewPartitionError
> renamePathAsExistsPathError
> renameAsExistsPathError
> renameSrcPathNotFoundError
> failedRenameTempFileError
> legacyMetadataPathExistsError
> partitionColumnNotFoundInSchemaError
> stateNotDefinedOrAlreadyRemovedError
> cannotSetTimeoutDurationError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36301) Refactor twelfth set of 20 query execution errors to use error classes

2021-08-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391279#comment-17391279
 ] 

dgd_contributor commented on SPARK-36301:
-

working on this

> Refactor twelfth set of 20 query execution errors to use error classes
> --
>
> Key: SPARK-36301
> URL: https://issues.apache.org/jira/browse/SPARK-36301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the twelfth set of 20.
> {code:java}
> cannotRewriteDomainJoinWithConditionsError
> decorrelateInnerQueryThroughPlanUnsupportedError
> methodCalledInAnalyzerNotAllowedError
> cannotSafelyMergeSerdePropertiesError
> pairUnsupportedAtFunctionError
> onceStrategyIdempotenceIsBrokenForBatchError[TreeType
> structuralIntegrityOfInputPlanIsBrokenInClassError
> structuralIntegrityIsBrokenAfterApplyingRuleError
> ruleIdNotFoundForRuleError
> cannotCreateArrayWithElementsExceedLimitError
> indexOutOfBoundsOfArrayDataError
> malformedRecordsDetectedInRecordParsingError
> remoteOperationsUnsupportedError
> invalidKerberosConfigForHiveServer2Error
> parentSparkUIToAttachTabNotFoundError
> inferSchemaUnsupportedForHiveError
> requestedPartitionsMismatchTablePartitionsError
> dynamicPartitionKeyNotAmongWrittenPartitionPathsError
> cannotRemovePartitionDirError
> cannotCreateStagingDirError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36298) Refactor ninth set of 20 query execution errors to use error classes

2021-08-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391277#comment-17391277
 ] 

dgd_contributor commented on SPARK-36298:
-

working on this.

> Refactor ninth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36298
> URL: https://issues.apache.org/jira/browse/SPARK-36298
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the ninth set of 20.
> {code:java}
> unscaledValueTooLargeForPrecisionError
> decimalPrecisionExceedsMaxPrecisionError
> outOfDecimalTypeRangeError
> unsupportedArrayTypeError
> unsupportedJavaTypeError
> failedParsingStructTypeError
> failedMergingFieldsError
> cannotMergeDecimalTypesWithIncompatiblePrecisionAndScaleError
> cannotMergeDecimalTypesWithIncompatiblePrecisionError
> cannotMergeDecimalTypesWithIncompatibleScaleError
> cannotMergeIncompatibleDataTypesError
> exceedMapSizeLimitError
> duplicateMapKeyFoundError
> mapDataKeyArrayLengthDiffersFromValueArrayLengthError
> fieldDiffersFromDerivedLocalDateError
> failToParseDateTimeInNewParserError
> failToFormatDateTimeInNewFormatterError
> failToRecognizePatternAfterUpgradeError
> failToRecognizePatternError
> cannotCastUTF8StringToDataTypeError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36299) Refactor tenth set of 20 query execution errors to use error classes

2021-08-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391278#comment-17391278
 ] 

dgd_contributor commented on SPARK-36299:
-

working on this.

> Refactor tenth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36299
> URL: https://issues.apache.org/jira/browse/SPARK-36299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the tenth set of 20.
> {code:java}
> registeringStreamingQueryListenerError
> concurrentQueryInstanceError
> cannotParseJsonArraysAsStructsError
> cannotParseStringAsDataTypeError
> failToParseEmptyStringForDataTypeError
> failToParseValueForDataTypeError
> rootConverterReturnNullError
> cannotHaveCircularReferencesInBeanClassError
> cannotHaveCircularReferencesInClassError
> cannotUseInvalidJavaIdentifierAsFieldNameError
> cannotFindEncoderForTypeError
> attributesForTypeUnsupportedError
> schemaForTypeUnsupportedError
> cannotFindConstructorForTypeError
> paramExceedOneCharError
> paramIsNotIntegerError
> paramIsNotBooleanValueError
> foundNullValueForNotNullableFieldError
> malformedCSVRecordError
> elementsOfTupleExceedLimitError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions

2021-07-29 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390283#comment-17390283
 ] 

dgd_contributor commented on SPARK-36343:
-

[~hyukjin.kwon] In my case, i'm using spark  thrift server version 2.4.4 in 
production. which integrate with apache ranger through 
[https://github.com/yaooqinn/spark-ranger] by conf:
{code:java}
spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension{code}
and now we want integrate spark sql with atlas through SAC 
[https://github.com/hortonworks-spark/spark-atlas-connector], which *does not 
support spark 3.x*  by conf
{code:java}
spark.sql.extensions com.hortonworks.spark.atlas.sql.SparkExtension{code}
 

 

> spark 2.4.x spark.sql.extensions support multiple extensions
> 
>
> Key: SPARK-36343
> URL: https://issues.apache.org/jira/browse/SPARK-36343
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8
>Reporter: dgd_contributor
>Priority: Major
>
> Like SPARK-26493



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions

2021-07-29 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36343:

Description: Like SPARK-26493  (was: Like issue [SPARK-26493])

> spark 2.4.x spark.sql.extensions support multiple extensions
> 
>
> Key: SPARK-36343
> URL: https://issues.apache.org/jira/browse/SPARK-36343
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8
>Reporter: dgd_contributor
>Priority: Major
>
> Like SPARK-26493



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36343) spark 2.4.x spark.sql.extensions support multiple extensions

2021-07-29 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36343:
---

 Summary: spark 2.4.x spark.sql.extensions support multiple 
extensions
 Key: SPARK-36343
 URL: https://issues.apache.org/jira/browse/SPARK-36343
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.8
Reporter: dgd_contributor


Like issue [SPARK-26493]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388387#comment-17388387
 ] 

dgd_contributor commented on SPARK-36099:
-

Sorry I wasn't checking the comment recently, I've done the work for the spark 
core but didn't create a pull request because I've been waiting for the approve 
in SPARK-36095.

Again, truly sorry for your wasted time.  [~Shockang]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387830#comment-17387830
 ] 

dgd_contributor commented on SPARK-36099:
-

[~Shockang] how is your progress, I already have the work done on my local repo 
and will make a pull request soon

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36291) Refactor second set of 20 query execution errors to use error classes

2021-07-26 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387720#comment-17387720
 ] 

dgd_contributor commented on SPARK-36291:
-

working on this

> Refactor second set of 20 query execution errors to use error classes
> -
>
> Key: SPARK-36291
> URL: https://issues.apache.org/jira/browse/SPARK-36291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the second set of 20.
> {code:java}
> inputTypeUnsupportedError
> invalidFractionOfSecondError
> overflowInSumOfDecimalError
> overflowInIntegralDivideError
> mapSizeExceedArraySizeWhenZipMapError
> copyNullFieldNotAllowedError
> literalTypeUnsupportedError
> noDefaultForDataTypeError
> doGenCodeOfAliasShouldNotBeCalledError
> orderedOperationUnsupportedByDataTypeError
> regexGroupIndexLessThanZeroError
> regexGroupIndexExceedGroupCountError
> invalidUrlError
> dataTypeOperationUnsupportedError
> mergeUnsupportedByWindowFunctionError
> dataTypeUnexpectedError
> typeUnsupportedError
> negativeValueUnexpectedError
> addNewFunctionMismatchedWithFunctionError
> cannotGenerateCodeForUncomparableTypeError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters

2021-07-21 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36229:

Description: 
1/ SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
inconsistency in behaviour where the returned value is different above the 64 
char threshold.

 
{noformat}
scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
+---+
|conv(repeat(?, 64), 10, 16)|
+---+
|                          0|
+---+




scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
+---+
|conv(repeat(?, 65), 10, 16)|
+---+
|           |
+---+




scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
++
|conv(repeat(?, 65), 10, -16)|
++
|                          -1|
++




scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
++
|conv(repeat(?, 64), 10, -16)|
++
|                           0|
++{noformat}
 

2/ conv should return result equal to max unsigned long value in base toBase 
when there is overflow
{code:java}
scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show 
// which should be 18446744073709551615

+---+
|conv(aaa0aaa0a, 16, 10)|
+---+
|   12297828695278266890|
+---+
{code}

  was:
SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
inconsistency in behaviour where the returned value is different above the 64 
char threshold.

 
{noformat}
scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
+---+
|conv(repeat(?, 64), 10, 16)|
+---+
|                          0|
+---+




scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
+---+
|conv(repeat(?, 65), 10, 16)|
+---+
|           |
+---+




scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
++
|conv(repeat(?, 65), 10, -16)|
++
|                          -1|
++




scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
++
|conv(repeat(?, 64), 10, -16)|
++
|                           0|
++{noformat}


> conv() inconsistently handles invalid strings with > 64 invalid characters
> --
>
> Key: SPARK-36229
> URL: https://issues.apache.org/jira/browse/SPARK-36229
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tim Armstrong
>Priority: Major
>
> 1/ SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
> inconsistency in behaviour where the returned value is different above the 64 
> char threshold.
>  
> {noformat}
> scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
> +---+
> |conv(repeat(?, 64), 10, 16)|
> +---+
> |                          0|
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
> +---+
> |conv(repeat(?, 65), 10, 16)|
> +---+
> |           |
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
> ++
> |conv(repeat(?, 65), 10, -16)|
> ++
> |                          -1|
> ++
> scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
> ++
> |conv(repeat(?, 64), 10, -16)|
> ++
> |                           0|
> ++{noformat}
>  
> 2/ conv should return result equal to max unsigned long value in base toBase 
> when there is overflow
> {code:java}
> scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show 
> // which should be 18446744073709551615
> +---+
> |conv(aaa0aaa0a, 16, 10)|
> +---+
> |   12297828695278266890|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters

2021-07-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384666#comment-17384666
 ] 

dgd_contributor edited comment on SPARK-36229 at 7/21/21, 5:44 AM:
---

After look closely, I found out that the overflow check in encode is wrong and 
need to work on too.

For example:
{code:java}
scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show

+---+
|conv(aaa0aaa0a, 16, 10)|
+---+
|   12297828695278266890|
+---+{code}
which should be 18446744073709551615

 

I will raise a pull request soon


was (Author: dc-heros):
After look closely, I found out that the overflow check in encode is wrong and 
need to work on too.

For example:
{code:java}
scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show

+---+
|conv(aaa0aaa0a, 16, 10)|
+---+
|   12297828695278266890|
+---+

which should be 18446744073709551615{code}
I will raise a pull request soon

> conv() inconsistently handles invalid strings with > 64 invalid characters
> --
>
> Key: SPARK-36229
> URL: https://issues.apache.org/jira/browse/SPARK-36229
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tim Armstrong
>Priority: Major
>
> SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
> inconsistency in behaviour where the returned value is different above the 64 
> char threshold.
>  
> {noformat}
> scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
> +---+
> |conv(repeat(?, 64), 10, 16)|
> +---+
> |                          0|
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
> +---+
> |conv(repeat(?, 65), 10, 16)|
> +---+
> |           |
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
> ++
> |conv(repeat(?, 65), 10, -16)|
> ++
> |                          -1|
> ++
> scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
> ++
> |conv(repeat(?, 64), 10, -16)|
> ++
> |                           0|
> ++{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters

2021-07-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384666#comment-17384666
 ] 

dgd_contributor commented on SPARK-36229:
-

After look closely, I found out that the overflow check in encode is wrong and 
need to work on too.

For example:
{code:java}
scala> spark.sql(select conv('aaa0aaa0a', 16, 10)).show

+---+
|conv(aaa0aaa0a, 16, 10)|
+---+
|   12297828695278266890|
+---+

which should be 18446744073709551615{code}
I will raise a pull request soon

> conv() inconsistently handles invalid strings with > 64 invalid characters
> --
>
> Key: SPARK-36229
> URL: https://issues.apache.org/jira/browse/SPARK-36229
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tim Armstrong
>Priority: Major
>
> SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
> inconsistency in behaviour where the returned value is different above the 64 
> char threshold.
>  
> {noformat}
> scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
> +---+
> |conv(repeat(?, 64), 10, 16)|
> +---+
> |                          0|
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
> +---+
> |conv(repeat(?, 65), 10, 16)|
> +---+
> |           |
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
> ++
> |conv(repeat(?, 65), 10, -16)|
> ++
> |                          -1|
> ++
> scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
> ++
> |conv(repeat(?, 64), 10, -16)|
> ++
> |                           0|
> ++{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36229) conv() inconsistently handles invalid strings with > 64 invalid characters

2021-07-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384610#comment-17384610
 ] 

dgd_contributor commented on SPARK-36229:
-

thanks, I will look into this

 

> conv() inconsistently handles invalid strings with > 64 invalid characters
> --
>
> Key: SPARK-36229
> URL: https://issues.apache.org/jira/browse/SPARK-36229
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tim Armstrong
>Priority: Major
>
> SPARK-33428 fixed ArrayIndexOutofBoundsException but introduced a new 
> inconsistency in behaviour where the returned value is different above the 64 
> char threshold.
>  
> {noformat}
> scala> spark.sql("select conv(repeat('?', 64), 10, 16)").show
> +---+
> |conv(repeat(?, 64), 10, 16)|
> +---+
> |                          0|
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, 16)").show
> +---+
> |conv(repeat(?, 65), 10, 16)|
> +---+
> |           |
> +---+
> scala> spark.sql("select conv(repeat('?', 65), 10, -16)").show
> ++
> |conv(repeat(?, 65), 10, -16)|
> ++
> |                          -1|
> ++
> scala> spark.sql("select conv(repeat('?', 64), 10, -16)").show
> ++
> |conv(repeat(?, 64), 10, -16)|
> ++
> |                           0|
> ++{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-07-14 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380396#comment-17380396
 ] 

dgd_contributor commented on SPARK-34532:
-

there is addExact func in IntervalUtils to handle overflow

> IntervalUtils.add() may result in 'long overflow'
> -
>
> Key: SPARK-34532
> URL: https://issues.apache.org/jira/browse/SPARK-34532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running test suite:
> build/sbt "sql/testOnly *SQLQueryTestSuite"
> {code}
> 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
> 6416.0 failed 1 times; aborting job
> [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
> 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 6476.0 (TID 7789)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.multiplyExact(Math.java:892)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> {code}
> {code}
> 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 14744.0 (TID 16705)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.addExact(Math.java:809)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
> at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
> at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
> {code}
> This likely was caused by the following line:
> {code}
> val microseconds = left.microseconds + right.microseconds
> {code}
> We should check whether the addition would produce overflow before adding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36095) Group exception messages in core/rdd

2021-07-12 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379511#comment-17379511
 ] 

dgd_contributor commented on SPARK-36095:
-

[~allisonwang-db] I would like to work on this, can you specify a general rule 
for spark core? would we create new package errors in org.apache.spark

> Group exception messages in core/rdd 
> -
>
> Key: SPARK-36095
> URL: https://issues.apache.org/jira/browse/SPARK-36095
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/rdd'
> || Filename||   Count ||
> | BlockRDD.scala  |   2 |
> | DoubleRDDFunctions.scala|   1 |
> | EmptyRDD.scala  |   1 |
> | HadoopRDD.scala |   1 |
> | LocalCheckpointRDD.scala|   1 |
> | NewHadoopRDD.scala  |   1 |
> | PairRDDFunctions.scala  |   7 |
> | PipedRDD.scala  |   1 |
> | RDD.scala   |   8 |
> | ReliableCheckpointRDD.scala |   4 |
> | ReliableRDDCheckpointData.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36022) Respect interval fields in extract

2021-07-07 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376334#comment-17376334
 ] 

dgd_contributor commented on SPARK-36022:
-

as they are calendarInterval, which is presented by months, days and 
microseconds, I think "2021 years" is equal to "2021 years and 0 months", why 
the last command should fail?

> Respect interval fields in extract
> --
>
> Key: SPARK-36022
> URL: https://issues.apache.org/jira/browse/SPARK-36022
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Extract should process only existing fields of interval types. For example:
> {code:sql}
> spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH);
> 11
> spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021' YEAR);
> 0
> {code}
> The last command should fail as the month field doesn't present in INTERVAL 
> YEAR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372566#comment-17372566
 ] 

dgd_contributor commented on SPARK-35955:
-

I will raise a pull request soon

 

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35841) Casting string to decimal type doesn't work if the sum of the digits is greater than 38

2021-06-21 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366924#comment-17366924
 ] 

dgd_contributor commented on SPARK-35841:
-

I would like to work on this

> Casting string to decimal type doesn't work if the sum of the digits is 
> greater than 38
> ---
>
> Key: SPARK-35841
> URL: https://issues.apache.org/jira/browse/SPARK-35841
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and 
> Spark 3.1.2 images
> (Hadoop 3.2.1, Python 3.9, Scala 2.12.13)
>Reporter: Roberto Gelsi
>Priority: Major
>
> Since Spark 3.1.1, NULL is returned when casting a string with many decimal 
> places to a decimal type. If the sum of the digits before and after the 
> decimal point is less than 39, a value is returned. From 39 digits, however, 
> NULL is returned.
> This worked until Spark 3.0.X.
> Code to reproduce:
> * A string with 2 decimal places in front of the decimal point and 37 decimal 
> places after the decimal point returns null
> {code:python}
> data = ['28.92599983799625624669715762138']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> +-+
> |value|
> +-+
> |null |
> +-+
>  
> * A string with 2 decimal places in front of the decimal point and 36 decimal 
> places after the decimal point returns the number as decimal
> {code:python}
> data = ['28.9259998379962562466971576213']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> ++
> |value   |
> ++
> |28.92600|
> ++
> * A string with 1 decimal place in front of the decimal point and 37 decimal 
> places after the decimal point returns the number as decimal
> {code:python}
> data = ['2.92599983799625624669715762138']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> +---+
> |value  |
> +---+
> |2.92600|
> +---+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33603) Group exception messages in execution/command

2021-06-16 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364629#comment-17364629
 ] 

dgd_contributor commented on SPARK-33603:
-

[~beliefer] can I work on this, if you don't mind?

 

> Group exception messages in execution/command
> -
>
> Key: SPARK-33603
> URL: https://issues.apache.org/jira/browse/SPARK-33603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/command'
> || Filename  ||   Count ||
> | AnalyzeColumnCommand.scala|   3 |
> | AnalyzePartitionCommand.scala |   2 |
> | AnalyzeTableCommand.scala |   1 |
> | SetCommand.scala  |   2 |
> | createDataSourceTables.scala  |   2 |
> | ddl.scala |   1 |
> | functions.scala   |   4 |
> | tables.scala  |   7 |
> | views.scala   |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35064) Group exception messages in spark/sql (catalyst)

2021-06-14 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363375#comment-17363375
 ] 

dgd_contributor commented on SPARK-35064:
-

I would like to work on this

> Group exception messages in spark/sql (catalyst)
> 
>
> Key: SPARK-35064
> URL: https://issues.apache.org/jira/browse/SPARK-35064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35622) DataFrame's count function do not need groupBy and avoid shuffle

2021-06-14 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363336#comment-17363336
 ] 

dgd_contributor commented on SPARK-35622:
-

Run a benchmark on my computer, df.rdd.count() execution time is 1/4 as 
df.count().

I create a df with 10 rows and run 1000 loops and df.rdd.count() execution time 
is 2671551 nanoSecond, df.count() execution time is 116798269100 nanoSecond

> DataFrame's count function do not need groupBy and avoid shuffle
> 
>
> Key: SPARK-35622
> URL: https://issues.apache.org/jira/browse/SPARK-35622
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: xiepengjie
>Priority: Major
>
> Use `df.rdd.count()` replace `df.count()`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-14 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362824#comment-17362824
 ] 

dgd_contributor commented on SPARK-35563:
-

After looking to this, I found out rowNumber in RowNumberLike is interger_type, 
but I don't think is a bug? Should I create a pull request?

> [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows
> --
>
> Key: SPARK-35563
> URL: https://issues.apache.org/jira/browse/SPARK-35563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: Robert Joseph Evans
>Priority: Blocker
>  Labels: data-loss
>
> I think this impacts a lot more versions of Spark, but I don't know for sure 
> because it takes a long time to test. As a part of doing corner case 
> validation testing for spark rapids I found that if a window function has 
> more than {{Int.MaxValue + 1}} rows the result is silently truncated to that 
> many rows. I have only tested this on 3.0.2 with {{row_number}}, but I 
> suspect it will impact others as well. This is a really rare corner case, but 
> because it is silent data corruption I personally think it is quite serious.
> {code:scala}
> import org.apache.spark.sql.expressions.Window
> val windowSpec = Window.partitionBy("a").orderBy("b")
> val df = spark.range(Int.MaxValue.toLong + 100).selectExpr(s"1 as a", "id as 
> b")
> spark.time(df.select(col("a"), col("b"), 
> row_number().over(windowSpec).alias("rn")).orderBy(desc("a"), 
> desc("b")).select((col("rn") < 0).alias("dir")).groupBy("dir").count.show(20))
> +-+--+
>   
> |  dir| count|
> +-+--+
> |false|2147483647|
> | true| 1|
> +-+--+
> Time taken: 1139089 ms
> Int.MaxValue.toLong + 100
> res15: Long = 2147483747
> 2147483647L + 1
> res16: Long = 2147483648
> {code}
> I had to make sure that I ran the above with at least 64GiB of heap for the 
> executor (I did it in local mode and it worked, but took forever to run)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32891) Enhance UTF8String.trim

2021-06-14 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362766#comment-17362766
 ] 

dgd_contributor commented on SPARK-32891:
-

After looking into this and run a few benchmark, I don't think there is a big 
different at the execution time between [UTF8String].trim().toString() and 
[UTF8String].toString().trim(), as for 100 loop on my computer, 
[UTF8String].trim().toString() execution time is 13168899600 nanoSecond and 
[UTF8String].toString().trim() is 11813350700 nanoSecond

> Enhance UTF8String.trim
> ---
>
> Key: SPARK-32891
> URL: https://issues.apache.org/jira/browse/SPARK-32891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> It sounds like {{UTF8String.trim}} is not implemented well. We may need to 
> look at how {{java.lang.String.trim}} is implemented.
> Please see comment:
>  
> [https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675]
> [https://github.com/apache/spark/pull/29731#discussion_r487709672]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35622) DataFrame's count function do not need groupBy and avoid shuffle

2021-06-13 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362681#comment-17362681
 ] 

dgd_contributor commented on SPARK-35622:
-

hi, could you explain more of this issue? where should we replace df.count() to 
df.rdd.count()?

 

> DataFrame's count function do not need groupBy and avoid shuffle
> 
>
> Key: SPARK-35622
> URL: https://issues.apache.org/jira/browse/SPARK-35622
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: xiepengjie
>Priority: Major
>
> Use `df.rdd.count()` replace `df.count()`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org