Nataneljpwd commented on code in PR #61528:
URL: https://github.com/apache/airflow/pull/61528#discussion_r2774556741


##########
providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py:
##########
@@ -265,10 +265,23 @@ def _resolve_connection(self) -> dict[str, Any]:
             # Master can be local, yarn, spark://HOST:PORT, mesos://HOST:PORT 
and
             # k8s://https://<HOST>:<PORT>
             conn = self.get_connection(self._conn_id)
-            if conn.port:
-                conn_data["master"] = f"{conn.host}:{conn.port}"
+
+            # connection comes from UI or it's a spark-master
+            if conn.conn_type == "spark":
+                if conn.host and ("://" in conn.host or not conn.port):
+                    conn_data["master"] = conn.host
+                elif conn.port:  # spark master/standalone has port
+                    conn_data["master"] = f"spark://{conn.host or ''}"
             else:
-                conn_data["master"] = conn.host
+                if conn.conn_type != "yarn":
+                    # For other conn_types (mesos, k8s, local, etc.): 
reconstruct URL
+                    conn_data["master"] = f"{conn.conn_type}://{conn.host or 
''}"
+                else:
+                    conn_data["master"] = conn.host

Review Comment:
   > I understand the complexity. but the complexity comes to answer this 
question:
   > what `spark` in the connection means? 
   > And it has two answer:
   > 1. `spark` is a  airflow`connection_type` such as `spark://local` or 
`spark://yarn`
   > 2. `spark` is refer to spark-master or stand-alone 
   
   It can be simplified, you do not have different logic for yarn connection 
type or spark connection type, the else block is redundant. 
   I proposed a way to simplify the if statements while retaining the same 
behavior 



##########
providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py:
##########
@@ -265,10 +265,23 @@ def _resolve_connection(self) -> dict[str, Any]:
             # Master can be local, yarn, spark://HOST:PORT, mesos://HOST:PORT 
and
             # k8s://https://<HOST>:<PORT>
             conn = self.get_connection(self._conn_id)
-            if conn.port:
-                conn_data["master"] = f"{conn.host}:{conn.port}"
+
+            # connection comes from UI or it's a spark-master
+            if conn.conn_type == "spark":
+                if conn.host and ("://" in conn.host or not conn.port):
+                    conn_data["master"] = conn.host
+                elif conn.port:  # spark master/standalone has port
+                    conn_data["master"] = f"spark://{conn.host or ''}"
             else:
-                conn_data["master"] = conn.host
+                if conn.conn_type != "yarn":
+                    # For other conn_types (mesos, k8s, local, etc.): 
reconstruct URL
+                    conn_data["master"] = f"{conn.conn_type}://{conn.host or 
''}"
+                else:
+                    conn_data["master"] = conn.host

Review Comment:
   > I understand the complexity. but the complexity comes to answer this 
question:
   > what `spark` in the connection means? 
   > And it has two answer:
   > 1. `spark` is a  airflow`connection_type` such as `spark://local` or 
`spark://yarn`
   > 2. `spark` is refer to spark-master or stand-alone 
   
   It can be simplified, you do not have different logic for yarn connection 
type or spark connection type, the else block is redundant. 
   I proposed a way to simplify the if statements while retaining the same 
behavior 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to