[ https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771836#comment-17771836 ]
Nikita Awasthi commented on SPARK-45093: ---------------------------------------- User 'cdkrot' has created a pull request for this issue: https://github.com/apache/spark/pull/43216 > AddArtifacts should give proper error messages if it fails > ---------------------------------------------------------- > > Key: SPARK-45093 > URL: https://issues.apache.org/jira/browse/SPARK-45093 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.5.0 > Reporter: Alice Sayutina > Assignee: Alice Sayutina > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > I've been trying to do some testing of udf's using code in other module, so > that AddArtifact is necessary. > > I got the following error: > > > {code:java} > Traceback (most recent call last): > File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in > <module> > spark.addArtifacts("udf2_support.py", pyfile=True) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py", > line 744, in addArtifacts > self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, > file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", > line 1582, in add_artifacts > self._artifact_manager.add_artifacts(*path, pyfile=pyfile, > archive=archive, file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 283, in add_artifacts > self._request_add_artifacts(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 259, in _request_add_artifacts > response: proto.AddArtifactsResponse = self._retrieve_responses(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 256, in _retrieve_responses > return self._stub.AddArtifacts(requests, metadata=self._metadata) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 1246, in __call__ > return _end_unary_response_blocking(state, call, False, None) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 910, in _end_unary_response_blocking > raise _InactiveRpcError(state) # pytype: disable=not-instantiable > grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated > with: > status = StatusCode.UNKNOWN > details = "Exception iterating requests!" > debug_error_string = "None" > {code} > > Which doesn't give any clue about what happens. > Only after noticeable investigation I found the problem: I'm specifying the > wrong path and the artifact fails to upload. Specifically what happens is > that ArtifactManager doesn't read the file immediately, but rather creates > iterator object which will incrementally generate requests to send. This > iterator is passed to grpc's stream_unary to consume and actually send, and > while grpc catches the error (see above), it suppresses the underlying > exception. > I think we should improve pyspark user experience. One of the possible ways > to fix this is to wrap ArtifactsManager._create_requests with an iterator > wrapper which would log the throwable into spark connect logger so that user > would see something like below at least when the debug mode is on. > > {code:java} > FileNotFoundError: [Errno 2] No such file or directory: > '/Users/alice.sayutina/udf2_support.py' {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org