[jira] [Updated] (SPARK-40265) Fix the inconsistent behavior for Index.intersection.

2022-08-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40265:

Description: 
There is inconsistent behavior on `Index.intersection` when `other` is list of 
tuple for pandas API on Spark as below:


{code:python}
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection([(1, 2), (3, 4)]).sort_values()
MultiIndex([], )
>>> pidx.intersection([(1, 2), (3, 4)]).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.

  was:
There is inconsistent behavior on Index.intersection for pandas API on Spark as 
below:


{code:python}
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection([(1, 2), (3, 4)]).sort_values()
MultiIndex([], )
>>> pidx.intersection([(1, 2), (3, 4)]).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.


> Fix the inconsistent behavior for Index.intersection.
> -
>
> Key: SPARK-40265
> URL: https://issues.apache.org/jira/browse/SPARK-40265
> Project: Spark
>  Issue Type: Test
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is inconsistent behavior on `Index.intersection` when `other` is list 
> of tuple for pandas API on Spark as below:
> {code:python}
> >>> pidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx.intersection([(1, 2), (3, 4)]).sort_values()
> MultiIndex([], )
> >>> pidx.intersection([(1, 2), (3, 4)]).sort_values()
> Traceback (most recent call last):
> ...
> ValueError: Names should be list-like for a MultiIndex
> {code}
> We should fix it to follow pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40265) Fix the inconsistent behavior for Index.intersection.

2022-08-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40265:

Description: 
There is inconsistent behavior on Index.intersection for pandas API on Spark as 
below:


{code:python}
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection([(1, 2), (3, 4)]).sort_values()
MultiIndex([], )
>>> pidx.intersection([(1, 2), (3, 4)]).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.

  was:
There is inconsistent behavior on Index.intersection for pandas API on Spark as 
below:


{code:python}
>>> other = [(1, 2), (3, 4)]
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection(other).sort_values()
MultiIndex([], )
>>> pidx.intersection(other).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.


> Fix the inconsistent behavior for Index.intersection.
> -
>
> Key: SPARK-40265
> URL: https://issues.apache.org/jira/browse/SPARK-40265
> Project: Spark
>  Issue Type: Test
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is inconsistent behavior on Index.intersection for pandas API on Spark 
> as below:
> {code:python}
> >>> pidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx.intersection([(1, 2), (3, 4)]).sort_values()
> MultiIndex([], )
> >>> pidx.intersection([(1, 2), (3, 4)]).sort_values()
> Traceback (most recent call last):
> ...
> ValueError: Names should be list-like for a MultiIndex
> {code}
> We should fix it to follow pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40265) Fix the inconsistent behavior for Index.intersection.

2022-08-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40265:

Description: 
There is inconsistent behavior on Index.intersection for pandas API on Spark as 
below:


{code:python}
>>> other = [(1, 2), (3, 4)]
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection(other).sort_values()
MultiIndex([], )
>>> pidx.intersection(other).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.

  was:
There is inconsistent behavior on Index.intersection for pandas API on Spark as 
below:


{code:python}
>>> pidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx
Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
>>> psidx.intersection(other).sort_values()
MultiIndex([], )
>>> pidx.intersection(other).sort_values()
Traceback (most recent call last):
...
ValueError: Names should be list-like for a MultiIndex
{code}

We should fix it to follow pandas.


> Fix the inconsistent behavior for Index.intersection.
> -
>
> Key: SPARK-40265
> URL: https://issues.apache.org/jira/browse/SPARK-40265
> Project: Spark
>  Issue Type: Test
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is inconsistent behavior on Index.intersection for pandas API on Spark 
> as below:
> {code:python}
> >>> other = [(1, 2), (3, 4)]
> >>> pidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx
> Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas')
> >>> psidx.intersection(other).sort_values()
> MultiIndex([], )
> >>> pidx.intersection(other).sort_values()
> Traceback (most recent call last):
> ...
> ValueError: Names should be list-like for a MultiIndex
> {code}
> We should fix it to follow pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org