[jira] [Resolved] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42007.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39525
[https://github.com/apache/spark/pull/39525]

> Reuse pyspark.sql.tests.test_group test cases
> -
>
> Key: SPARK-42007
> URL: https://issues.apache.org/jira/browse/SPARK-42007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42007:


Assignee: Hyukjin Kwon

> Reuse pyspark.sql.tests.test_group test cases
> -
>
> Key: SPARK-42007
> URL: https://issues.apache.org/jira/browse/SPARK-42007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42025) Optimize logs for removeBlocks and removeShuffleMerge PRC

2023-01-12 Thread Wan Kun (Jira)
Wan Kun created SPARK-42025:
---

 Summary: Optimize logs for removeBlocks and removeShuffleMerge PRC
 Key: SPARK-42025
 URL: https://issues.apache.org/jira/browse/SPARK-42025
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Wan Kun


1. Change the error logs level for RemoveBlocks RPC.
2. Add some error logs for RemoveShuffleMerge RPC.

Discuss link: https://github.com/apache/spark/pull/37922#discussion_r1067551485




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42026) Protobuf serializer for AppSummary and PoolData

2023-01-12 Thread Yang Jie (Jira)
Yang Jie created SPARK-42026:


 Summary: Protobuf serializer for AppSummary and PoolData
 Key: SPARK-42026
 URL: https://issues.apache.org/jira/browse/SPARK-42026
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42026) Protobuf serializer for AppSummary and PoolData

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42026:


Assignee: (was: Apache Spark)

> Protobuf serializer for AppSummary and PoolData
> ---
>
> Key: SPARK-42026
> URL: https://issues.apache.org/jira/browse/SPARK-42026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42026) Protobuf serializer for AppSummary and PoolData

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42026:


Assignee: Apache Spark

> Protobuf serializer for AppSummary and PoolData
> ---
>
> Key: SPARK-42026
> URL: https://issues.apache.org/jira/browse/SPARK-42026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42026) Protobuf serializer for AppSummary and PoolData

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675861#comment-17675861
 ] 

Apache Spark commented on SPARK-42026:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39530

> Protobuf serializer for AppSummary and PoolData
> ---
>
> Key: SPARK-42026
> URL: https://issues.apache.org/jira/browse/SPARK-42026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42008:


Assignee: Hyukjin Kwon

> Reuse pyspark.sql.tests.test_datasources test cases 
> 
>
> Key: SPARK-42008
> URL: https://issues.apache.org/jira/browse/SPARK-42008
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42008.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39526
[https://github.com/apache/spark/pull/39526]

> Reuse pyspark.sql.tests.test_datasources test cases 
> 
>
> Key: SPARK-42008
> URL: https://issues.apache.org/jira/browse/SPARK-42008
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42003) Reduce duplicate code in ResolveGroupByAll

2023-01-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42003.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39523
[https://github.com/apache/spark/pull/39523]

> Reduce duplicate code in ResolveGroupByAll
> --
>
> Key: SPARK-42003
> URL: https://issues.apache.org/jira/browse/SPARK-42003
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42009.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39527
[https://github.com/apache/spark/pull/39527]

> Reuse pyspark.sql.tests.test_serde test cases 
> --
>
> Key: SPARK-42009
> URL: https://issues.apache.org/jira/browse/SPARK-42009
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42009:


Assignee: Hyukjin Kwon

> Reuse pyspark.sql.tests.test_serde test cases 
> --
>
> Key: SPARK-42009
> URL: https://issues.apache.org/jira/browse/SPARK-42009
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41989) PYARROW_IGNORE_TIMEZONE warning can break application logging setup

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41989.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39516
[https://github.com/apache/spark/pull/39516]

> PYARROW_IGNORE_TIMEZONE warning can break application logging setup
> ---
>
> Key: SPARK-41989
> URL: https://issues.apache.org/jira/browse/SPARK-41989
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.3
> Environment: python 3.9 env with pyspark installed
>Reporter: Stefaan Lippens
>Assignee: Stefaan Lippens
>Priority: Major
> Fix For: 3.4.0
>
>
> in {code}python/pyspark/pandas/__init__.py{code}  there is currently a 
> warning when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
> (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
> {code:python}
> import logging
> logging.warning(
> "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
> required to "...
> {code}
> The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
> call (at least in python 3.9, which I tried).
> (FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
> silent call)
> This has the following very hard to figure out side-effect:
> importing `pyspark.pandas` (directly or indirectly somewhere)  might break 
> your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
> Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):
> {code:python}
> import logging
> import pyspark.pandas
> logging.basicConfig(level=logging.DEBUG)
> logger = logging.getLogger("test")
> logger.warning("I warn you")
> logger.debug("I debug you")
> {code}
> Will only produce the warning, not the debug line.
> By removing the {{import pyspark.pandas}}, the debug line is produced



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41989) PYARROW_IGNORE_TIMEZONE warning can break application logging setup

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41989:


Assignee: Stefaan Lippens

> PYARROW_IGNORE_TIMEZONE warning can break application logging setup
> ---
>
> Key: SPARK-41989
> URL: https://issues.apache.org/jira/browse/SPARK-41989
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.3
> Environment: python 3.9 env with pyspark installed
>Reporter: Stefaan Lippens
>Assignee: Stefaan Lippens
>Priority: Major
>
> in {code}python/pyspark/pandas/__init__.py{code}  there is currently a 
> warning when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
> (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
> {code:python}
> import logging
> logging.warning(
> "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
> required to "...
> {code}
> The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
> call (at least in python 3.9, which I tried).
> (FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
> silent call)
> This has the following very hard to figure out side-effect:
> importing `pyspark.pandas` (directly or indirectly somewhere)  might break 
> your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
> Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):
> {code:python}
> import logging
> import pyspark.pandas
> logging.basicConfig(level=logging.DEBUG)
> logger = logging.getLogger("test")
> logger.warning("I warn you")
> logger.debug("I debug you")
> {code}
> Will only produce the warning, not the debug line.
> By removing the {{import pyspark.pandas}}, the debug line is produced



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41989) PYARROW_IGNORE_TIMEZONE warning can break application logging setup

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41989:
-
Fix Version/s: 3.2.4
   3.3.2

> PYARROW_IGNORE_TIMEZONE warning can break application logging setup
> ---
>
> Key: SPARK-41989
> URL: https://issues.apache.org/jira/browse/SPARK-41989
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.3
> Environment: python 3.9 env with pyspark installed
>Reporter: Stefaan Lippens
>Assignee: Stefaan Lippens
>Priority: Major
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
>
> in {code}python/pyspark/pandas/__init__.py{code}  there is currently a 
> warning when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
> (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
> {code:python}
> import logging
> logging.warning(
> "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
> required to "...
> {code}
> The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
> call (at least in python 3.9, which I tried).
> (FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
> silent call)
> This has the following very hard to figure out side-effect:
> importing `pyspark.pandas` (directly or indirectly somewhere)  might break 
> your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
> Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):
> {code:python}
> import logging
> import pyspark.pandas
> logging.basicConfig(level=logging.DEBUG)
> logger = logging.getLogger("test")
> logger.warning("I warn you")
> logger.debug("I debug you")
> {code}
> Will only produce the warning, not the debug line.
> By removing the {{import pyspark.pandas}}, the debug line is produced



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42027:


 Summary: CreateDataframe from Pandas with Struct and Timestamp
 Key: SPARK-42027
 URL: https://issues.apache.org/jira/browse/SPARK-42027
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


The following should be supported and correctly truncate the nanosecond 
timestamps.

{code:python}
from datetime import datetime, timezone, timedelta
from pandas import Timestamp

ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
tz=timezone(timedelta(hours=-8)))

d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
spark.createDataFrame(d).collect()

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42028:


 Summary: Support Pandas DF to Spark DF with Nanosecond Timestamps
 Key: SPARK-42028
 URL: https://issues.apache.org/jira/browse/SPARK-42028
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42029:


 Summary: Distribution build for Spark Connect does not work with 
Spark Shell
 Key: SPARK-42029
 URL: https://issues.apache.org/jira/browse/SPARK-42029
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675909#comment-17675909
 ] 

Apache Spark commented on SPARK-42028:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/39469

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42028:


Assignee: Apache Spark

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42028:


Assignee: (was: Apache Spark)

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675910#comment-17675910
 ] 

Apache Spark commented on SPARK-42028:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/39469

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675912#comment-17675912
 ] 

Apache Spark commented on SPARK-42029:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/39531

> Distribution build for Spark Connect does not work with Spark Shell
> ---
>
> Key: SPARK-42029
> URL: https://issues.apache.org/jira/browse/SPARK-42029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42029:


Assignee: (was: Apache Spark)

> Distribution build for Spark Connect does not work with Spark Shell
> ---
>
> Key: SPARK-42029
> URL: https://issues.apache.org/jira/browse/SPARK-42029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42029:


Assignee: Apache Spark

> Distribution build for Spark Connect does not work with Spark Shell
> ---
>
> Key: SPARK-42029
> URL: https://issues.apache.org/jira/browse/SPARK-42029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42029:


Assignee: Martin Grund

> Distribution build for Spark Connect does not work with Spark Shell
> ---
>
> Key: SPARK-42029
> URL: https://issues.apache.org/jira/browse/SPARK-42029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42029.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39531
[https://github.com/apache/spark/pull/39531]

> Distribution build for Spark Connect does not work with Spark Shell
> ---
>
> Key: SPARK-42029
> URL: https://issues.apache.org/jira/browse/SPARK-42029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42010:


Assignee: Hyukjin Kwon

> Reuse pyspark.sql.tests.test_column test cases
> --
>
> Key: SPARK-42010
> URL: https://issues.apache.org/jira/browse/SPARK-42010
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42010.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39528
[https://github.com/apache/spark/pull/39528]

> Reuse pyspark.sql.tests.test_column test cases
> --
>
> Key: SPARK-42010
> URL: https://issues.apache.org/jira/browse/SPARK-42010
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42028:


Assignee: Martin Grund

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42028.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39469
[https://github.com/apache/spark/pull/39469]

> Support Pandas DF to Spark DF with Nanosecond Timestamps
> 
>
> Key: SPARK-42028
> URL: https://issues.apache.org/jira/browse/SPARK-42028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42019.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39529
[https://github.com/apache/spark/pull/39529]

> Reuse pyspark.sql.tests.test_types test cases
> -
>
> Key: SPARK-42019
> URL: https://issues.apache.org/jira/browse/SPARK-42019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42030) Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread Yang Jie (Jira)
Yang Jie created SPARK-42030:


 Summary: Remove unused Constructor from RocksDB.TypeAliases and 
LevelDB.TypeAliases
 Key: SPARK-42030
 URL: https://issues.apache.org/jira/browse/SPARK-42030
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases

2023-01-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42019:


Assignee: Hyukjin Kwon

> Reuse pyspark.sql.tests.test_types test cases
> -
>
> Key: SPARK-42019
> URL: https://issues.apache.org/jira/browse/SPARK-42019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42030) Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675963#comment-17675963
 ] 

Apache Spark commented on SPARK-42030:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39532

> Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases
> --
>
> Key: SPARK-42030
> URL: https://issues.apache.org/jira/browse/SPARK-42030
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42030) Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42030:


Assignee: (was: Apache Spark)

> Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases
> --
>
> Key: SPARK-42030
> URL: https://issues.apache.org/jira/browse/SPARK-42030
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42030) Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42030:


Assignee: Apache Spark

> Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases
> --
>
> Key: SPARK-42030
> URL: https://issues.apache.org/jira/browse/SPARK-42030
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40307) Introduce Arrow-optimized Python UDFs

2023-01-12 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675983#comment-17675983
 ] 

Xinrong Meng commented on SPARK-40307:
--

Resolved by https://github.com/apache/spark/pull/39384.

> Introduce Arrow-optimized Python UDFs
> -
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40307) Introduce Arrow-optimized Python UDFs

2023-01-12 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-40307.
--
  Assignee: Xinrong Meng
Resolution: Resolved

> Introduce Arrow-optimized Python UDFs
> -
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42031) Clean up remove methods that do not need override

2023-01-12 Thread Yang Jie (Jira)
Yang Jie created SPARK-42031:


 Summary: Clean up remove methods that do not need override
 Key: SPARK-42031
 URL: https://issues.apache.org/jira/browse/SPARK-42031
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


Java 8 began to provide the default remove method implementation for the 
`java.util.Iterator` interface.


https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42031) Clean up remove methods that do not need override

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42031:


Assignee: Apache Spark

> Clean up remove methods that do not need override
> -
>
> Key: SPARK-42031
> URL: https://issues.apache.org/jira/browse/SPARK-42031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Java 8 began to provide the default remove method implementation for the 
> `java.util.Iterator` interface.
> https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42031) Clean up remove methods that do not need override

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676009#comment-17676009
 ] 

Apache Spark commented on SPARK-42031:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39533

> Clean up remove methods that do not need override
> -
>
> Key: SPARK-42031
> URL: https://issues.apache.org/jira/browse/SPARK-42031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Java 8 began to provide the default remove method implementation for the 
> `java.util.Iterator` interface.
> https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42031) Clean up remove methods that do not need override

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42031:


Assignee: (was: Apache Spark)

> Clean up remove methods that do not need override
> -
>
> Key: SPARK-42031
> URL: https://issues.apache.org/jira/browse/SPARK-42031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Java 8 began to provide the default remove method implementation for the 
> `java.util.Iterator` interface.
> https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42032) transform_key, transform_values doctest output have different order

2023-01-12 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42032:
-

 Summary: transform_key, transform_values doctest output have 
different order
 Key: SPARK-42032
 URL: https://issues.apache.org/jira/browse/SPARK-42032
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng


not sure whether this should be fixed:


{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1623, in pyspark.sql.connect.functions.transform_keys
Failed example:
df.select(transform_keys(
"data", lambda k, _: upper(k)).alias("data_upper")
).show(truncate=False)
Expected:
+-+
|data_upper   |
+-+
|{BAR -> 2.0, FOO -> -2.0}|
+-+
Got:
+-+
|data_upper   |
+-+
|{FOO -> -2.0, BAR -> 2.0}|
+-+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1630, in pyspark.sql.connect.functions.transform_values
Failed example:
df.select(transform_values(
"data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
).alias("new_data")).show(truncate=False)
Expected:
+---+
|new_data   |
+---+
|{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
+---+
Got:
+---+
|new_data   |
+---+
|{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.transform_keys
   1 of   2 in pyspark.sql.connect.functions.transform_values

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Ruifeng Zheng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41746 ]


Ruifeng Zheng deleted comment on SPARK-41746:
---

was (Author: podongfeng):

{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1423, in pyspark.sql.connect.functions.map_filter
Failed example:
df.select(map_filter(
"data", lambda _, v: v > 30.0).alias("data_filtered")
).show(truncate=False)
Expected:
+--+
|data_filtered |
+--+
|{baz -> 32.0, foo -> 42.0}|
+--+
Got:
+--+
|data_filtered |
+--+
|{foo -> 42.0, baz -> 32.0}|
+--+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1465, in pyspark.sql.connect.functions.map_zip_with
Failed example:
df.select(map_zip_with(
"base", "ratio", lambda k, v1, v2: round(v1 * v2, 
2)).alias("updated_data")
).show(truncate=False)
Expected:
+---+
|updated_data   |
+---+
|{SALES -> 16.8, IT -> 48.0}|
+---+
Got:
+---+
|updated_data   |
+---+
|{IT -> 48.0, SALES -> 16.8}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.map_filter
   1 of   2 in pyspark.sql.connect.functions.map_zip_with
{code}


> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676059#comment-17676059
 ] 

Ruifeng Zheng commented on SPARK-41746:
---


{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1423, in pyspark.sql.connect.functions.map_filter
Failed example:
df.select(map_filter(
"data", lambda _, v: v > 30.0).alias("data_filtered")
).show(truncate=False)
Expected:
+--+
|data_filtered |
+--+
|{baz -> 32.0, foo -> 42.0}|
+--+
Got:
+--+
|data_filtered |
+--+
|{foo -> 42.0, baz -> 32.0}|
+--+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1465, in pyspark.sql.connect.functions.map_zip_with
Failed example:
df.select(map_zip_with(
"base", "ratio", lambda k, v1, v2: round(v1 * v2, 
2)).alias("updated_data")
).show(truncate=False)
Expected:
+---+
|updated_data   |
+---+
|{SALES -> 16.8, IT -> 48.0}|
+---+
Got:
+---+
|updated_data   |
+---+
|{IT -> 48.0, SALES -> 16.8}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.map_filter
   1 of   2 in pyspark.sql.connect.functions.map_zip_with
{code}


> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42032) transform_key, transform_values doctest output have different order

2023-01-12 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676060#comment-17676060
 ] 

Ruifeng Zheng commented on SPARK-42032:
---


{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1423, in pyspark.sql.connect.functions.map_filter
Failed example:
df.select(map_filter(
"data", lambda _, v: v > 30.0).alias("data_filtered")
).show(truncate=False)
Expected:
+--+
|data_filtered |
+--+
|{baz -> 32.0, foo -> 42.0}|
+--+
Got:
+--+
|data_filtered |
+--+
|{foo -> 42.0, baz -> 32.0}|
+--+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1465, in pyspark.sql.connect.functions.map_zip_with
Failed example:
df.select(map_zip_with(
"base", "ratio", lambda k, v1, v2: round(v1 * v2, 
2)).alias("updated_data")
).show(truncate=False)
Expected:
+---+
|updated_data   |
+---+
|{SALES -> 16.8, IT -> 48.0}|
+---+
Got:
+---+
|updated_data   |
+---+
|{IT -> 48.0, SALES -> 16.8}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.map_filter
   1 of   2 in pyspark.sql.connect.functions.map_zip_with
{code}


> transform_key, transform_values doctest output have different order
> ---
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this should be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42032) Map data show in different order

2023-01-12 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-42032:
--
Summary: Map data show in different order  (was: transform_key, 
transform_values doctest output have different order)

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this should be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42032) Map data show in different order

2023-01-12 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-42032:
--
Description: 
not sure whether this needs to be fixed:


{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1623, in pyspark.sql.connect.functions.transform_keys
Failed example:
df.select(transform_keys(
"data", lambda k, _: upper(k)).alias("data_upper")
).show(truncate=False)
Expected:
+-+
|data_upper   |
+-+
|{BAR -> 2.0, FOO -> -2.0}|
+-+
Got:
+-+
|data_upper   |
+-+
|{FOO -> -2.0, BAR -> 2.0}|
+-+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1630, in pyspark.sql.connect.functions.transform_values
Failed example:
df.select(transform_values(
"data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
).alias("new_data")).show(truncate=False)
Expected:
+---+
|new_data   |
+---+
|{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
+---+
Got:
+---+
|new_data   |
+---+
|{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.transform_keys
   1 of   2 in pyspark.sql.connect.functions.transform_values

{code}


  was:
not sure whether this should be fixed:


{code:java}
**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1623, in pyspark.sql.connect.functions.transform_keys
Failed example:
df.select(transform_keys(
"data", lambda k, _: upper(k)).alias("data_upper")
).show(truncate=False)
Expected:
+-+
|data_upper   |
+-+
|{BAR -> 2.0, FOO -> -2.0}|
+-+
Got:
+-+
|data_upper   |
+-+
|{FOO -> -2.0, BAR -> 2.0}|
+-+

**
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
line 1630, in pyspark.sql.connect.functions.transform_values
Failed example:
df.select(transform_values(
"data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
).alias("new_data")).show(truncate=False)
Expected:
+---+
|new_data   |
+---+
|{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
+---+
Got:
+---+
|new_data   |
+---+
|{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
+---+

**
   1 of   2 in pyspark.sql.connect.functions.transform_keys
   1 of   2 in pyspark.sql.connect.functions.transform_values

{code}



> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> *

[jira] [Assigned] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41746:


Assignee: (was: Apache Spark)

> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41746:


Assignee: Apache Spark

> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676065#comment-17676065
 ] 

Apache Spark commented on SPARK-41746:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41838) DataFrame.show() fix map printing

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676067#comment-17676067
 ] 

Apache Spark commented on SPARK-41838:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41838) DataFrame.show() fix map printing

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41838:


Assignee: Apache Spark

> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41746) SparkSession.createDataFrame does not support nested datatypes

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676066#comment-17676066
 ] 

Apache Spark commented on SPARK-41746:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> SparkSession.createDataFrame does not support nested datatypes
> --
>
> Key: SPARK-41746
> URL: https://issues.apache.org/jira/browse/SPARK-41746
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/group.py", line 183, in 
> pyspark.sql.connect.group.GroupedData.pivot
> Failed example:
> df2 = spark.createDataFrame([
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=1)),
> Row(training="junior", sales=Row(course="Java", year=2012, 
> earnings=2)),
> Row(training="expert", sales=Row(course="dotNET", year=2012, 
> earnings=5000)),
> Row(training="junior", sales=Row(course="dotNET", year=2013, 
> earnings=48000)),
> Row(training="expert", sales=Row(course="Java", year=2013, 
> earnings=3)),
> ])
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df2 = spark.createDataFrame([
>   File 
> "/.../workspace/forked/spark/python/pyspark/sql/connect/session.py", line 
> 196, in createDataFrame
> table = pa.Table.from_pandas(pdf)
>   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in dataframe_to_arrays
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 611, in 
> arrays = [convert_column(c, f)
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 598, in convert_column
> raise e
>   File 
> "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/pyarrow/pandas_compat.py",
>  line 592, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 
> 'Conversion failed for column 1 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41838) DataFrame.show() fix map printing

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41838:


Assignee: (was: Apache Spark)

> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41838) DataFrame.show() fix map printing

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676068#comment-17676068
 ] 

Apache Spark commented on SPARK-41838:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676071#comment-17676071
 ] 

Apache Spark commented on SPARK-41837:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676072#comment-17676072
 ] 

Apache Spark commented on SPARK-41837:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676073#comment-17676073
 ] 

Apache Spark commented on SPARK-41837:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41838) DataFrame.show() fix map printing

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676070#comment-17676070
 ] 

Apache Spark commented on SPARK-41838:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41837:


Assignee: Apache Spark

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41837:


Assignee: (was: Apache Spark)

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676074#comment-17676074
 ] 

Apache Spark commented on SPARK-41837:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame.createDataFrame datatype conversion error
> ---
>
> Key: SPARK-41837
> URL: https://issues.apache.org/jira/browse/SPARK-41837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1594, in pyspark.sql.connect.functions.to_json
> Failed example:
>     df = spark.createDataFrame(data, ("key", "value"))
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df = spark.createDataFrame(data, ("key", "value"))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 252, in createDataFrame
>         table = pa.Table.from_pandas(pdf)
>       File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in dataframe_to_arrays
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, 
> in 
>         arrays = [convert_column(c, f)
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, 
> in convert_column
>         raise e
>       File 
> "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, 
> in convert_column
>         result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>       File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>       File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>       File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: 
> tried to convert to int64", 'Conversion failed for column 1 with type 
> object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41835:


Assignee: (was: Apache Spark)

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676075#comment-17676075
 ] 

Apache Spark commented on SPARK-41835:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41835:


Assignee: Apache Spark

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676076#comment-17676076
 ] 

Apache Spark commented on SPARK-41835:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676078#comment-17676078
 ] 

Apache Spark commented on SPARK-41835:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41836:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676082#comment-17676082
 ] 

Apache Spark commented on SPARK-41835:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41836:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676083#comment-17676083
 ] 

Apache Spark commented on SPARK-41835:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41847:


Assignee: (was: Apache Spark)

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n,

[jira] [Assigned] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41847:


Assignee: Apache Spark

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         p

[jira] [Commented] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676080#comment-17676080
 ] 

Apache Spark commented on SPARK-41836:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676085#comment-17676085
 ] 

Apache Spark commented on SPARK-41836:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676090#comment-17676090
 ] 

Apache Spark commented on SPARK-41847:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh

[jira] [Commented] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676094#comment-17676094
 ] 

Apache Spark commented on SPARK-41847:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh

[jira] [Commented] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676099#comment-17676099
 ] 

Apache Spark commented on SPARK-41847:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh

[jira] [Commented] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676096#comment-17676096
 ] 

Apache Spark commented on SPARK-41836:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41836) Implement `transform_values` function

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676093#comment-17676093
 ] 

Apache Spark commented on SPARK-41836:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676100#comment-17676100
 ] 

Apache Spark commented on SPARK-41847:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39535

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1364, in pyspark.sql.connect.functions.inline
> Failed example:
>     df.select(inline(df.structlist)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(inline(df.structlist)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
> of type "ARRAY" while it's required to be "STRUCT BIGINT>".
>     Plan:  {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1411, in pyspark.sql.connect.functions.map_filter
> Failed example:
>     df.select(map_filter(
>         "data", lambda _, v: v > 30.0).alias("data_filtered")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, 
> in 
>         df.select(map_filter(
>       File 
> "/Users/s.singh

[jira] [Created] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)
Pankaj Nagla created SPARK-42033:


 Summary: Docker Tag Error 25 on gitlab-ci.yml trying to start 
GitLab Pipeline
 Key: SPARK-42033
 URL: https://issues.apache.org/jira/browse/SPARK-42033
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.1.1
Reporter: Pankaj Nagla
 Fix For: 1.6.2


I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"


cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops 
Training][https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"


cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [#Aws Sysops 
> Training][https://www.igmg

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [[Aws Sysops Training||#Aws Sysops Training] 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]|#Aws Sysops Training] follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops Training] follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [[Aws Sysops Training||#Aws Sysops Training] 
> [https://www.igmguru.co

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops Training] follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops 
Training][https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [#Aws Sysops Training] follow the page.
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [[Aws Sysops Training||#Aws Sysops Training] 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]|#Aws Sysops Training] follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed i

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training][[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops 
> Training][[https://www.ig

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training][[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops 
> Training|[https://www.igm

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks

[url=https://www.igmguru.com/digital-marketing-programming/mulesoft-training/]Mule
 4 Training[/url]

https://www.igmguru.com/salesforce/salesforce-commerce-cloud-certification-training/";>Salesforce
 Demandware Training

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit cod

[jira] [Commented] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark

2023-01-12 Thread Nick Hryhoriev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676265#comment-17676265
 ] 

Nick Hryhoriev commented on SPARK-36263:


Hi, I find this feature does not work for `foreach` or `foreachPartition` 
actions.
Maybe because they use `rdd.foreach` under the hood.
Look like it's because `foreach` or `foreachPartition` actions do not work with 
`QueryExecutionListener`.
Example to reproduce
[https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3]
Do I miss something or is it's a know issue?

> Add Dataset.observe(Observation, Column, Column*) to PySpark
> 
>
> Key: SPARK-36263
> URL: https://issues.apache.org/jira/browse/SPARK-36263
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Major
> Fix For: 3.3.0
>
>
> With SPARK-34806 we now have a way to use the `Dataset.observe` method 
> without the need to interact with 
> `org.apache.spark.sql.util.QueryExecutionListener`. This allows us to easily 
> retrieve observations in PySpark.
> Adding a `Dataset.observe(Observation, Column, Column*)` equivalent to 
> PySpark's `DataFrame` is straightforward while it allows to utilise 
> observations from Python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]]
 follow the page.

Thanks

[url=https://www.igmguru.com/digital-marketing-programming/mulesoft-training/]Mule
 4 Training[/url]

https://www.igmguru.com/salesforce/salesforce-commerce-cloud-certification-training/";>Salesforce
 Demandware Training


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit c

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training]([https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/)|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these 

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops 
Training][https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training]([https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/)|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [[Aws Sysops Training||#Aws Sysops Training] 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]|#Aws Sysops Training] follow the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [#Aws Sysops 
Training][https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]
 follow the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discu

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops Training| 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]#Aws Sysops Training] follow the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [[Aws Sysops Training||#Aws Sysops Training] 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]|#Aws Sysops Training] follow the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these pro

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/
 
|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
 the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops Training| 
[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/] 
[]#Aws Sysops Training] follow the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker con

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/
 ] 
|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
 the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/
 
|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
 the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Regist

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-12 Thread Pankaj Nagla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Nagla updated SPARK-42033:
-
Description: 
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/
 
|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
 the page.

Thanks

 

  was:
I'm going through the "Scalable FastAPI Application on AWS" course. My 
gitlab-ci.yml file is below.

    stages:
  - docker

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  key: ${CI_JOB_NAME}
  paths:
    - ${CI_PROJECT_DIR}/services/talk_booking/.venv/

build-python-ci-image:
  image: docker:19.03.0
  services:
    - docker:19.03.0-dind
  stage: docker
  before_script:
    - cd ci_cd/python/
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
$CI_REGISTRY
    - docker build -t 
registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
    - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim

My Pipeline fails with this error:

See 
[https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
Login Succeeded
$ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
invalid argument "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" 
for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 125

It may or may not be relevant but the Container Registry for the GitLab project 
says there's a Docker connection error. All these problems have been discussed 
in this [Aws Sysops 
Training|[https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/
 ] 
|https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
 the page.

Thanks

 


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Regist

  1   2   3   >