[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449667794



##
File path: tests/plugins/test_plugin_ignore.py
##
@@ -0,0 +1,96 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+"""
+Test that the .airflowignore work and whether the file is properly ignored.
+"""
+
+def setUp(self):
+"""
+Make tmp folder and files that should be ignored. And set base path.
+"""
+self.test_dir = tempfile.mkdtemp()
+self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+os.mkdir(os.path.join(self.test_dir, "test_ignore"))
+with open(os.path.join(self.plugin_folder_path, "test_load.py"), "w") 
as file:
+file.write("#Should not be ignored file")
+with open(os.path.join(self.plugin_folder_path, ".airflowignore"), 
"w") as file:
+file.write("#ignore test\nnot\nsubdir2")
+os.mkdir(os.path.join(self.plugin_folder_path, "subdir1"))
+with open(os.path.join(self.plugin_folder_path, 
"subdir1/.airflowignore"), "w") as file:
+file.write("#ignore test\nnone")
+with open(os.path.join(self.plugin_folder_path, 
"subdir1/test_load_sub1.py"), "w") as file:
+file.write("#Should not be ignored file")
+with open(os.path.join(self.plugin_folder_path, 
"test_notload_sub.py"), 'w') as file:
+file.write('raise Exception("This file should have been 
ignored!")')
+with open(os.path.join(self.plugin_folder_path, 
"subdir1/test_noneload_sub1.py"), 'w') as file:
+file.write('raise Exception("This file should have been 
ignored!")')
+os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+with open(os.path.join(self.plugin_folder_path, 
"subdir2/test_shouldignore.py"), 'w') as file:
+file.write('raise Exception("This file should have been 
ignored!")')
+with open(os.path.join(self.plugin_folder_path, 
"subdir2/test_shouldignore.py"), 'w') as file:
+file.write('raise Exception("This file should have been 
ignored!")')

Review comment:
   Yes!
   The notation you suggest is better than the existing my code.
   
   I fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+"""
+Test load operator
+"""
+@apply_defaults
+def __init__(
+self,
+*args,
+**kwargs):
+super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+def execute(self, context):
+pass

Review comment:
   These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 
95)of the test_plugin_ignore.py. )
   
   But
   I fixed these files and ".airflowignore" files to be generated by 
test_plugin_ignore.py, and delete pull files!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+"""
+Test load operator
+"""
+@apply_defaults
+def __init__(
+self,
+*args,
+**kwargs):
+super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+def execute(self, context):
+pass

Review comment:
   These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 
95)of the test_plugin_ignore.py. )
   
   But
   I fixed these files to be generated by test_plugin_ignore.py, and delete 
these files!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+"""
+Test load operator
+"""
+@apply_defaults
+def __init__(
+self,
+*args,
+**kwargs):
+super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+def execute(self, context):
+pass

Review comment:
   These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 
95)of the test_plugin_ignore.py. )
   
   But
   I fixed these files to be generated by test_plugin_ignore.py, and delete 
pull files!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+"""
+Test load operator
+"""
+@apply_defaults
+def __init__(
+self,
+*args,
+**kwargs):
+super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+def execute(self, context):
+pass

Review comment:
   These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 
95)of the test_plugin_ignore.py. )
   
   But
   I fixed these files to be generated by test_plugin_ignore.py.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615262



##
File path: tests/plugins/test_plugin_ignore.py
##
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+"""
+Test that the .airflowignore work and whether the file is properly ignored.
+"""
+
+def setUp(self):
+"""
+Make tmp folder and files that should be ignored. And set base path.
+"""
+self.test_dir = tempfile.mkdtemp()
+self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), 
self.plugin_folder_path)
+file = open(os.path.join(self.plugin_folder_path, 
"test_notload_sub.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()
+file = open(os.path.join(self.plugin_folder_path, 
"subdir1/test_noneload_sub1.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()
+os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+file = open(os.path.join(self.plugin_folder_path, 
"subdir2/test_shouldignore.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()

Review comment:
   Of course!
   I fixed.

##
File path: tests/plugins/test_plugin_ignore.py
##
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+"""
+Test that the .airflowignore work and whether the file is properly ignored.
+"""
+
+def setUp(self):
+"""
+Make tmp folder and files that should be ignored. And set base path.
+"""
+self.test_dir = tempfile.mkdtemp()
+self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), 
self.plugin_folder_path)
+file = open(os.path.join(self.plugin_folder_path, 
"test_notload_sub.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()
+file = open(os.path.join(self.plugin_folder_path, 
"subdir1/test_noneload_sub1.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()
+os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+file = open(os.path.join(self.plugin_folder_path, 
"subdir2/test_shouldignore.py"), 'w')
+file.write('raise Exception("This file should have been ignored!")')
+file.close()
+self.mock_plugins_folder = patch.object(
+settings, 'PLUGINS_FOLDER', return_value=self.plugin_folder_path
+   

[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-03 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+"""
+Test load operator
+"""
+@apply_defaults
+def __init__(
+self,
+*args,
+**kwargs):
+super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+def execute(self, context):
+pass

Review comment:
   These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 of the 
test_plugin_ignore.py. )
   
   But
   I fixed these files to be generated by test_plugin_ignore.py.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-02 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448993794



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,47 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.

Review comment:
   I'm sorry.
   It's my simple mistake.
   I fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-02 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448993794



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,47 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.

Review comment:
   I'm sorry.
   It's a simple mistake on my part.
   I fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448459428



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.
+
+:return : file path not to be ignored
+"""
+
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+ignore_list_file_path = os.path.join(root, ignore_list_file)
+if os.path.isfile(ignore_list_file_path):
+with open(ignore_list_file_path, 'r') as file:
+lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for 
line in file.read().split("\n")]

Review comment:
   It is not necessary.
   I fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466908



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.
+
+:return : file path not to be ignored
+"""
+
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+ignore_list_file_path = os.path.join(root, ignore_list_file)
+if os.path.isfile(ignore_list_file_path):
+with open(ignore_list_file_path, 'r') as file:
+lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for 
line in file.read().split("\n")]
+patterns += [re.compile(line) for line in lines_no_comments if 
line]
+patterns = list(set(patterns))
+
+dirs[:] = [
+subdir
+for subdir in dirs
+if not any(p.search(
+os.path.join(os.path.relpath(root, str(base_dir_path)), 
subdir)) for p in patterns)
+]
+
+for subdir in dirs:
+patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()
+
+for file in files:  # type: ignore
+if file == ignore_list_file:
+continue
+file_path = os.path.join(root, str(file))
+if any([re.findall(p, file_path) for p in patterns]):

Review comment:
   Yes.
   I fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466461



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.
+
+:return : file path not to be ignored
+"""
+
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+ignore_list_file_path = os.path.join(root, ignore_list_file)
+if os.path.isfile(ignore_list_file_path):
+with open(ignore_list_file_path, 'r') as file:
+lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for 
line in file.read().split("\n")]
+patterns += [re.compile(line) for line in lines_no_comments if 
line]
+patterns = list(set(patterns))
+
+dirs[:] = [
+subdir
+for subdir in dirs
+if not any(p.search(
+os.path.join(os.path.relpath(root, str(base_dir_path)), 
subdir)) for p in patterns)
+]
+
+for subdir in dirs:
+patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()

Review comment:
   This is necessary.
   A canonical pattern that is evaluated in a parent directory must also be 
evaluated in its parent's child directories. At least that's how .airflowignore 
(selection of dag) is currently specificated.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466461



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.
+
+:return : file path not to be ignored
+"""
+
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+ignore_list_file_path = os.path.join(root, ignore_list_file)
+if os.path.isfile(ignore_list_file_path):
+with open(ignore_list_file_path, 'r') as file:
+lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for 
line in file.read().split("\n")]
+patterns += [re.compile(line) for line in lines_no_comments if 
line]
+patterns = list(set(patterns))
+
+dirs[:] = [
+subdir
+for subdir in dirs
+if not any(p.search(
+os.path.join(os.path.relpath(root, str(base_dir_path)), 
subdir)) for p in patterns)
+]
+
+for subdir in dirs:
+patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()

Review comment:
   This is necessary.
   A canonical pattern that is evaluated in a parent directory must also be 
evaluated in its parent's child directories. 
   At least that's how .airflowignore (selection of dag) is currently 
specificated.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448459428



##
File path: airflow/utils/file.py
##
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
 return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+base_dir_path: str,
+ignore_list_file: str) -> Generator[str, None, None]:
+"""
+Search the file and return the path of the file that should not be ignored.
+:param base_dir_path: the base path to be searched for.
+:param ignore_file_list_name: the file name in which specifies a regular 
expression pattern is written.
+
+:return : file path not to be ignored
+"""
+
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+ignore_list_file_path = os.path.join(root, ignore_list_file)
+if os.path.isfile(ignore_list_file_path):
+with open(ignore_list_file_path, 'r') as file:
+lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for 
line in file.read().split("\n")]

Review comment:
   It is not necessary.
   And I have consolidated the wasteful looping process into one by making 
modifications!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448457433



##
File path: airflow/plugins_manager.py
##
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-# Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
-for f in files:
-filepath = os.path.join(root, f)
-try:
-if not os.path.isfile(filepath):
-continue
-mod_name, file_ext = os.path.splitext(
-os.path.split(filepath)[-1])
-if file_ext != '.py':
-continue
-
-log.debug('Importing plugin module %s', filepath)
-
-loader = importlib.machinery.SourceFileLoader(mod_name, 
filepath)
-spec = importlib.util.spec_from_loader(mod_name, loader)
-mod = importlib.util.module_from_spec(spec)
-sys.modules[spec.name] = mod
-loader.exec_module(mod)
-for mod_attr_value in list(mod.__dict__.values()):
-if is_valid_plugin(mod_attr_value):
-plugin_instance = mod_attr_value()
-plugins.append(plugin_instance)
-except Exception as e:  # pylint: disable=broad-except
-log.exception(e)
-path = filepath or str(f)
-log.error('Failed to import plugin %s', path)
-import_errors[path] = str(e)
+ignore_list_file = ".airflowignore"
+
+for file_path in find_path_from_directory(  # pylint: 
disable=too-many-nested-blocks
+str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+try:
+if not os.path.isfile(file_path):
+continue
+mod_name, file_ext = os.path.splitext(
+os.path.split(file_path)[-1])
+if file_ext != '.py':
+continue
+
+log.info('Importing plugin module %s', file_path)
+
+loader = importlib.machinery.SourceFileLoader(mod_name, file_path)
+spec = importlib.util.spec_from_loader(mod_name, loader)
+mod = importlib.util.module_from_spec(spec)
+sys.modules[spec.name] = mod
+loader.exec_module(mod)
+for mod_attr_value in list(mod.__dict__.values()):
+if is_valid_plugin(mod_attr_value):
+plugin_instance = mod_attr_value()
+plugins.append(plugin_instance)

Review comment:
   Beautiful code!
   I reflect this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448456168



##
File path: airflow/plugins_manager.py
##
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-# Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
-for f in files:
-filepath = os.path.join(root, f)
-try:
-if not os.path.isfile(filepath):
-continue
-mod_name, file_ext = os.path.splitext(
-os.path.split(filepath)[-1])
-if file_ext != '.py':
-continue
-
-log.debug('Importing plugin module %s', filepath)
-
-loader = importlib.machinery.SourceFileLoader(mod_name, 
filepath)
-spec = importlib.util.spec_from_loader(mod_name, loader)
-mod = importlib.util.module_from_spec(spec)
-sys.modules[spec.name] = mod
-loader.exec_module(mod)
-for mod_attr_value in list(mod.__dict__.values()):
-if is_valid_plugin(mod_attr_value):
-plugin_instance = mod_attr_value()
-plugins.append(plugin_instance)
-except Exception as e:  # pylint: disable=broad-except
-log.exception(e)
-path = filepath or str(f)
-log.error('Failed to import plugin %s', path)
-import_errors[path] = str(e)
+ignore_list_file = ".airflowignore"
+
+for file_path in find_path_from_directory(  # pylint: 
disable=too-many-nested-blocks
+str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+try:
+if not os.path.isfile(file_path):
+continue
+mod_name, file_ext = os.path.splitext(
+os.path.split(file_path)[-1])
+if file_ext != '.py':
+continue

Review comment:
   No
   I fixed!

##
File path: airflow/plugins_manager.py
##
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-# Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
-for f in files:
-filepath = os.path.join(root, f)
-try:
-if not os.path.isfile(filepath):
-continue
-mod_name, file_ext = os.path.splitext(
-os.path.split(filepath)[-1])
-if file_ext != '.py':
-continue
-
-log.debug('Importing plugin module %s', filepath)
-
-loader = importlib.machinery.SourceFileLoader(mod_name, 
filepath)
-spec = importlib.util.spec_from_loader(mod_name, loader)
-mod = importlib.util.module_from_spec(spec)
-sys.modules[spec.name] = mod
-loader.exec_module(mod)
-for mod_attr_value in list(mod.__dict__.values()):
-if is_valid_plugin(mod_attr_value):
-plugin_instance = mod_attr_value()
-plugins.append(plugin_instance)
-except Exception as e:  # pylint: disable=broad-except
-log.exception(e)
-path = filepath or str(f)
-log.error('Failed to import plugin %s', path)
-import_errors[path] = str(e)
+ignore_list_file = ".airflowignore"
+
+for file_path in find_path_from_directory(  # pylint: 
disable=too-many-nested-blocks
+str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+try:
+if not os.path.isfile(file_path):
+continue
+mod_name, file_ext = os.path.splitext(
+os.path.split(file_path)[-1])
+if file_ext != '.py':
+continue

Review comment:
   No
   I fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448453473



##
File path: airflow/plugins_manager.py
##
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-# Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
-for f in files:
-filepath = os.path.join(root, f)
-try:
-if not os.path.isfile(filepath):
-continue
-mod_name, file_ext = os.path.splitext(
-os.path.split(filepath)[-1])
-if file_ext != '.py':
-continue
-
-log.debug('Importing plugin module %s', filepath)
-
-loader = importlib.machinery.SourceFileLoader(mod_name, 
filepath)
-spec = importlib.util.spec_from_loader(mod_name, loader)
-mod = importlib.util.module_from_spec(spec)
-sys.modules[spec.name] = mod
-loader.exec_module(mod)
-for mod_attr_value in list(mod.__dict__.values()):
-if is_valid_plugin(mod_attr_value):
-plugin_instance = mod_attr_value()
-plugins.append(plugin_instance)
-except Exception as e:  # pylint: disable=broad-except
-log.exception(e)
-path = filepath or str(f)
-log.error('Failed to import plugin %s', path)
-import_errors[path] = str(e)
+ignore_list_file = ".airflowignore"
+
+for file_path in find_path_from_directory(  # pylint: 
disable=too-many-nested-blocks
+str(settings.PLUGINS_FOLDER), str(ignore_list_file)):

Review comment:
   It is not necessary.
   I fixed.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

2020-07-01 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448452969



##
File path: airflow/plugins_manager.py
##
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-# Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
-for f in files:
-filepath = os.path.join(root, f)
-try:
-if not os.path.isfile(filepath):
-continue
-mod_name, file_ext = os.path.splitext(
-os.path.split(filepath)[-1])
-if file_ext != '.py':
-continue
-
-log.debug('Importing plugin module %s', filepath)
-
-loader = importlib.machinery.SourceFileLoader(mod_name, 
filepath)
-spec = importlib.util.spec_from_loader(mod_name, loader)
-mod = importlib.util.module_from_spec(spec)
-sys.modules[spec.name] = mod
-loader.exec_module(mod)
-for mod_attr_value in list(mod.__dict__.values()):
-if is_valid_plugin(mod_attr_value):
-plugin_instance = mod_attr_value()
-plugins.append(plugin_instance)
-except Exception as e:  # pylint: disable=broad-except
-log.exception(e)
-path = filepath or str(f)
-log.error('Failed to import plugin %s', path)
-import_errors[path] = str(e)
+ignore_list_file = ".airflowignore"

Review comment:
   It's not particularly necessary.
   I fixed!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

2020-06-28 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446663722



##
File path: docs/concepts.rst
##
@@ -1465,3 +1465,13 @@ would not be scanned by Airflow at all. This improves 
efficiency of DAG finding)
 The scope of a ``.airflowignore`` file is the directory it is in plus all its 
subfolders.
 You can also prepare ``.airflowignore`` file for a subfolder in ``DAG_FOLDER`` 
and it
 would only be applicable for that subfolder.
+
+
+.pluginignore
+''
+
+A ``.pluginignore`` file specifies the directories or files in 
``PLUGINS_FOLDER``

Review comment:
   I agree with your opinion. 
   >In my opinion, you can use .airflowignore and it will be easier to use.
   
   It was fixed from `.pluginignore` to `.airflowignore` . :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

2020-06-27 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446508436



##
File path: airflow/plugins_manager.py
##
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
 # Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
+for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, 
followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+ignore_file = os.path.join(root, '.pluginignore')
+
+if os.path.isfile(ignore_file):
+with open(ignore_file, 'r') as file:

Review comment:
   You are right.
   In fact, this is pretty much the same code.
   
I made a generator. What do you think of this? 
   (Perhaps we can make the generator func share with file.py but)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

2020-06-27 Thread GitBox


j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446508436



##
File path: airflow/plugins_manager.py
##
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
 global plugins  # pylint: disable=global-statement
 log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
 # Crawl through the plugins folder to find AirflowPlugin derivatives
-for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  
# noqa # pylint: disable=too-many-nested-blocks
+for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, 
followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+ignore_file = os.path.join(root, '.pluginignore')
+
+if os.path.isfile(ignore_file):
+with open(ignore_file, 'r') as file:

Review comment:
   You are right.
   In fact, this is pretty much the same code.
   
I make a generator. What do you think of this? 
   (Perhaps we can make the generator func share with file.py but)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org