kaxil commented on code in PR #64180:
URL: https://github.com/apache/airflow/pull/64180#discussion_r2983634829
##########
providers/google/src/airflow/providers/google/marketing_platform/operators/bid_manager.py:
##########
@@ -22,6 +22,7 @@
import json
import shutil
import tempfile
+import urllib.parse
Review Comment:
The PR adds `import urllib.parse` but line 28 already has `from urllib.parse
import urlsplit`. After this change you have both styles in the same file.
Simpler to add `urlparse` to the existing import: `from urllib.parse import
urlparse, urlsplit`, then use bare `urlparse(file_url)` below.
##########
providers/google/src/airflow/providers/google/marketing_platform/operators/bid_manager.py:
##########
@@ -317,8 +318,15 @@ def execute(self, context: Context):
# If no custom report_name provided, use Bid Manager name
file_url = resource["metadata"]["googleCloudStoragePath"]
- if urllib.parse.urlparse(file_url).scheme == "file":
- raise AirflowException("Accessing local file is not allowed in
this operator")
+ parsed_url = urllib.parse.urlparse(file_url)
Review Comment:
Nice switch from blocklist to allowlist. One thing to consider:
`urllib.request.urlopen` (line 334 below) follows HTTP redirects by default.
The validation here checks the initial URL, but a 302 from a GCS host could
theoretically redirect to an arbitrary destination. In practice GCS doesn't do
that, but for a security-focused change, either a comment explaining why this
is acceptable or a custom opener that rejects redirects would close the gap.
##########
providers/google/tests/unit/google/marketing_platform/operators/test_bid_manager.py:
##########
@@ -74,7 +74,16 @@ def teardown_method(self):
session.execute(delete(TI))
@pytest.mark.parametrize(
- ("file_path", "should_except"), [("https://host/path", False),
("file:/path/to/file", True)]
+ ("file_path", "should_except"),
+ [
+ ("https://storage.googleapis.com/bucket/report.csv", False),
+ ("https://storage.cloud.google.com/bucket/report.csv", False),
+ ("file:/path/to/file", True),
+ ("http://storage.googleapis.com/bucket/report.csv", True),
+ ("https://evil.com/report.csv", True),
+ ("https://internal-service.local/secret", True),
+ ("ftp://storage.googleapis.com/bucket/report.csv", True),
Review Comment:
Good coverage. Worth adding a userinfo-based bypass attempt too, something
like `("https://[email protected]/report.csv", True)`. `urlparse`
handles this correctly (hostname resolves to `evil.com`), but having it in the
test suite documents awareness of that attack vector.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]