Nataneljpwd commented on code in PR #61528:
URL: https://github.com/apache/airflow/pull/61528#discussion_r2774220545


##########
providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py:
##########
@@ -265,10 +265,23 @@ def _resolve_connection(self) -> dict[str, Any]:
             # Master can be local, yarn, spark://HOST:PORT, mesos://HOST:PORT 
and
             # k8s://https://<HOST>:<PORT>
             conn = self.get_connection(self._conn_id)
-            if conn.port:
-                conn_data["master"] = f"{conn.host}:{conn.port}"
+
+            # connection comes from UI or it's a spark-master
+            if conn.conn_type == "spark":
+                if conn.host and ("://" in conn.host or not conn.port):
+                    conn_data["master"] = conn.host
+                elif conn.port:  # spark master/standalone has port
+                    conn_data["master"] = f"spark://{conn.host or ''}"
             else:
-                conn_data["master"] = conn.host
+                if conn.conn_type != "yarn":
+                    # For other conn_types (mesos, k8s, local, etc.): 
reconstruct URL
+                    conn_data["master"] = f"{conn.conn_type}://{conn.host or 
''}"
+                else:
+                    conn_data["master"] = conn.host
+
+            # Append port if provided
+            if conn.port and conn_data["master"]:

Review Comment:
   The `conn_data` check will result in a keyerror if  the master was not set, 
better use get here, and this can also be done inline



##########
providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py:
##########
@@ -265,10 +265,23 @@ def _resolve_connection(self) -> dict[str, Any]:
             # Master can be local, yarn, spark://HOST:PORT, mesos://HOST:PORT 
and
             # k8s://https://<HOST>:<PORT>
             conn = self.get_connection(self._conn_id)
-            if conn.port:
-                conn_data["master"] = f"{conn.host}:{conn.port}"
+
+            # connection comes from UI or it's a spark-master
+            if conn.conn_type == "spark":
+                if conn.host and ("://" in conn.host or not conn.port):

Review Comment:
   Can there be a case where a host is set AND the port is not set? 
   
   What if I use a custom secret backend and not the regular ui to store 
connections? I can see a problem here where the Conn type is spark, yet a port 
is not set and '://' is not in the host string, which causes nothing to happen 
and the application later fails on a key error



##########
providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py:
##########
@@ -265,10 +265,23 @@ def _resolve_connection(self) -> dict[str, Any]:
             # Master can be local, yarn, spark://HOST:PORT, mesos://HOST:PORT 
and
             # k8s://https://<HOST>:<PORT>
             conn = self.get_connection(self._conn_id)
-            if conn.port:
-                conn_data["master"] = f"{conn.host}:{conn.port}"
+
+            # connection comes from UI or it's a spark-master
+            if conn.conn_type == "spark":
+                if conn.host and ("://" in conn.host or not conn.port):
+                    conn_data["master"] = conn.host
+                elif conn.port:  # spark master/standalone has port
+                    conn_data["master"] = f"spark://{conn.host or ''}"
             else:
-                conn_data["master"] = conn.host
+                if conn.conn_type != "yarn":
+                    # For other conn_types (mesos, k8s, local, etc.): 
reconstruct URL
+                    conn_data["master"] = f"{conn.conn_type}://{conn.host or 
''}"
+                else:
+                    conn_data["master"] = conn.host

Review Comment:
   I think this whole logic can be simplified, why not check if the connection 
type is yarn OR spark, set the master with an inline if statement, otherwise, 
set the master connection accordingly
   
   This can reduce nesting and make the logic simpler to understand and change 
in the future



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to