[jira] [Commented] (AIRFLOW-58) Add bulk_dump abstract method to DbApiHook
[ https://issues.apache.org/jira/browse/AIRFLOW-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305057#comment-15305057 ] Lance Norskog commented on AIRFLOW-58: -- Copying data in bulk turns out to be a really messy problem. It will take a lot of work and air(flow) time before you have a reasonably complete solution. For example, you're going to want to save a bulk copy task as a file and then restart the task in the middle. We use the Embulk project to do table-to-table copies. We packaged Embulk as a web service and call it from a custom Operator. I can look into open-sourcing our work. Embulk is a Java app, so the web service needs a multi-gig machine to run. [http://www.embulk.org/docs/] We would prefer to use the Sqoop program, but that's off-limits since we're converting from Hadoop. > Add bulk_dump abstract method to DbApiHook > -- > > Key: AIRFLOW-58 > URL: https://issues.apache.org/jira/browse/AIRFLOW-58 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: Airflow 1.7.0 >Reporter: Bence Nagy >Assignee: Bence Nagy >Priority: Trivial > > I just see no reason for having a method for bulk loading but not for the > inverse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bodley reopened AIRFLOW-179: - > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > Fix For: Airflow 1.8 > > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line > 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", > line 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Additionally the escaping logic for single quotes (') within the > _serialize_cell method seems wrong, i.e. > str(cell).replace("'", "''") > would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'". > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-186) conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-186. Resolution: Fixed > conn.literal is specific to MySQLdb, and should be factored out of the > dbapi_hook > - > > Key: AIRFLOW-186 > URL: https://issues.apache.org/jira/browse/AIRFLOW-186 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer > Original Estimate: 4h > Remaining Estimate: 4h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304585#comment-15304585 ] ASF subversion and git services commented on AIRFLOW-179: - Commit 8f63640584ca2dcd15bcd361d1f9a0d995bad315 in incubator-airflow's branch refs/heads/master from [~artwr] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8f63640 ] Revert "[AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters" This reverts commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067. Reverting as the method used in the dbapi hook is actually package specific to MySQLdb and would break the sqlite and mssql hooks. > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > Fix For: Airflow 1.8 > > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line > 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", > line 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Additionally the escaping logic for single quotes (') within the > _serialize_cell method seems wrong, i.e. > str(cell).replace("'", "''") > would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'". > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: Revert "[AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters"
Repository: incubator-airflow Updated Branches: refs/heads/master 87b4b8fa1 -> 8f6364058 Revert "[AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters" This reverts commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067. Reverting as the method used in the dbapi hook is actually package specific to MySQLdb and would break the sqlite and mssql hooks. Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8f636405 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8f636405 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8f636405 Branch: refs/heads/master Commit: 8f63640584ca2dcd15bcd361d1f9a0d995bad315 Parents: 87b4b8f Author: Arthur Wiedmer Authored: Fri May 27 11:38:57 2016 -0700 Committer: Arthur Wiedmer Committed: Fri May 27 11:38:57 2016 -0700 -- airflow/hooks/dbapi_hook.py | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8f636405/airflow/hooks/dbapi_hook.py -- diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py index 9e128a2..e5de92e 100644 --- a/airflow/hooks/dbapi_hook.py +++ b/airflow/hooks/dbapi_hook.py @@ -1,5 +1,8 @@ +from builtins import str from past.builtins import basestring +from datetime import datetime +import numpy import logging from airflow.hooks.base_hook import BaseHook @@ -168,7 +171,10 @@ class DbApiHook(BaseHook): i = 0 for row in rows: i += 1 -values = [conn.literal(cell) for cell in row] +l = [] +for cell in row: +l.append(self._serialize_cell(cell)) +values = tuple(l) sql = "INSERT INTO {0} {1} VALUES ({2});".format( table, target_fields, @@ -184,6 +190,19 @@ class DbApiHook(BaseHook): logging.info( "Done loading. Loaded a total of {i} rows".format(**locals())) +@staticmethod +def _serialize_cell(cell): +if isinstance(cell, basestring): +return "'" + str(cell).replace("'", "''") + "'" +elif cell is None: +return 'NULL' +elif isinstance(cell, numpy.datetime64): +return "'" + str(cell) + "'" +elif isinstance(cell, datetime): +return "'" + cell.isoformat() + "'" +else: +return str(cell) + def bulk_dump(self, table, tmp_file): """ Dumps a database table into a tab-delimited file
[jira] [Created] (AIRFLOW-187) Make PR tool more user-friendly
Jeremiah Lowin created AIRFLOW-187: -- Summary: Make PR tool more user-friendly Key: AIRFLOW-187 URL: https://issues.apache.org/jira/browse/AIRFLOW-187 Project: Apache Airflow Issue Type: Improvement Components: PR tool Reporter: Jeremiah Lowin Priority: Minor General JIRA improvement that can be referenced for any UX improvements to the PR tool, including better or more prompts, documentation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-186) conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304496#comment-15304496 ] Arthur Wiedmer commented on AIRFLOW-186: [~john.bod...@gmail.com] FYI > conn.literal is specific to MySQLdb, and should be factored out of the > dbapi_hook > - > > Key: AIRFLOW-186 > URL: https://issues.apache.org/jira/browse/AIRFLOW-186 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer > Original Estimate: 4h > Remaining Estimate: 4h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-186) conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook
Arthur Wiedmer created AIRFLOW-186: -- Summary: conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook Key: AIRFLOW-186 URL: https://issues.apache.org/jira/browse/AIRFLOW-186 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-179. --- Resolution: Fixed Fix Version/s: Airflow 1.8 +1 Merged. Thanks! > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > Fix For: Airflow 1.8 > > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line > 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", > line 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Additionally the escaping logic for single quotes (') within the > _serialize_cell method seems wrong, i.e. > str(cell).replace("'", "''") > would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'". > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304466#comment-15304466 ] ASF subversion and git services commented on AIRFLOW-179: - Commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067 in incubator-airflow's branch refs/heads/master from [~john.bod...@gmail.com] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=87b4b8f ] [AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-179 In addition to correctly serializing non-ASCII characters the literal transformation also corrects an issue with escaping single quotes ('). Note it was my intention to add another unit test to `test_hive_to_mysql` in `tests/core.py` however on inspection the indentations of the various methods seemed wrong, methods are nested and it's not apparent what class they refer to. Additionally it seems a number of the test cases aren't related to the corresponding class. For testing purposes I simply ran a pipeline which previously failed with the following exception, [2016-05-26 22:03:39,256] {models.py:1286} ERROR - 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1245, in run result = task_copy.execute(context=context) File "/usr/local/lib/python2.7/dist-packages/airflow/operators/hive_to_mysql.py", line 88, in execute mysql.insert_rows(table=self.mysql_table, rows=results) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 176, in insert_rows l.append(self._serialize_cell(cell)) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) and verified with the presence of the fix that the task succeeded and the resulting output was correct. Note currently from grokking the code base it seems that only `MySqlHook` objects call the the `insert_rows` method. Author: John Bodley Closes #1550 from johnbodley/dbapi_hook_serialization. > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > Fix For: Airflow 1.8 > > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line > 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", > line 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Additionally the escaping logic for single quotes (') within the > _serialize_cell method seems wrong, i.e. > str(cell).replace("'", "''") > would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'". > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304467#comment-15304467 ] ASF subversion and git services commented on AIRFLOW-179: - Commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067 in incubator-airflow's branch refs/heads/master from [~john.bod...@gmail.com] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=87b4b8f ] [AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-179 In addition to correctly serializing non-ASCII characters the literal transformation also corrects an issue with escaping single quotes ('). Note it was my intention to add another unit test to `test_hive_to_mysql` in `tests/core.py` however on inspection the indentations of the various methods seemed wrong, methods are nested and it's not apparent what class they refer to. Additionally it seems a number of the test cases aren't related to the corresponding class. For testing purposes I simply ran a pipeline which previously failed with the following exception, [2016-05-26 22:03:39,256] {models.py:1286} ERROR - 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1245, in run result = task_copy.execute(context=context) File "/usr/local/lib/python2.7/dist-packages/airflow/operators/hive_to_mysql.py", line 88, in execute mysql.insert_rows(table=self.mysql_table, rows=results) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 176, in insert_rows l.append(self._serialize_cell(cell)) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) and verified with the presence of the fix that the task succeeded and the resulting output was correct. Note currently from grokking the code base it seems that only `MySqlHook` objects call the the `insert_rows` method. Author: John Bodley Closes #1550 from johnbodley/dbapi_hook_serialization. > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > Fix For: Airflow 1.8 > > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line > 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", > line 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Additionally the escaping logic for single quotes (') within the > _serialize_cell method seems wrong, i.e. > str(cell).replace("'", "''") > would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'". > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters
Repository: incubator-airflow Updated Branches: refs/heads/master 4aac54d64 -> 87b4b8fa1 [AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII characters Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-179 In addition to correctly serializing non-ASCII characters the literal transformation also corrects an issue with escaping single quotes ('). Note it was my intention to add another unit test to `test_hive_to_mysql` in `tests/core.py` however on inspection the indentations of the various methods seemed wrong, methods are nested and it's not apparent what class they refer to. Additionally it seems a number of the test cases aren't related to the corresponding class. For testing purposes I simply ran a pipeline which previously failed with the following exception, [2016-05-26 22:03:39,256] {models.py:1286} ERROR - 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1245, in run result = task_copy.execute(context=context) File "/usr/local/lib/python2.7/dist-packages/airflow/operators/hive_to_mysql.py", line 88, in execute mysql.insert_rows(table=self.mysql_table, rows=results) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 176, in insert_rows l.append(self._serialize_cell(cell)) File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 230: ordinal not in range(128) and verified with the presence of the fix that the task succeeded and the resulting output was correct. Note currently from grokking the code base it seems that only `MySqlHook` objects call the the `insert_rows` method. Author: John Bodley Closes #1550 from johnbodley/dbapi_hook_serialization. Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/87b4b8fa Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/87b4b8fa Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/87b4b8fa Branch: refs/heads/master Commit: 87b4b8fa19cb660317198d74f6d51fdde0a7e067 Parents: 4aac54d Author: John Bodley Authored: Fri May 27 10:58:07 2016 -0700 Committer: Chris Riccomini Committed: Fri May 27 10:58:07 2016 -0700 -- airflow/hooks/dbapi_hook.py | 21 + 1 file changed, 1 insertion(+), 20 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/87b4b8fa/airflow/hooks/dbapi_hook.py -- diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py index e5de92e..9e128a2 100644 --- a/airflow/hooks/dbapi_hook.py +++ b/airflow/hooks/dbapi_hook.py @@ -1,8 +1,5 @@ -from builtins import str from past.builtins import basestring -from datetime import datetime -import numpy import logging from airflow.hooks.base_hook import BaseHook @@ -171,10 +168,7 @@ class DbApiHook(BaseHook): i = 0 for row in rows: i += 1 -l = [] -for cell in row: -l.append(self._serialize_cell(cell)) -values = tuple(l) +values = [conn.literal(cell) for cell in row] sql = "INSERT INTO {0} {1} VALUES ({2});".format( table, target_fields, @@ -190,19 +184,6 @@ class DbApiHook(BaseHook): logging.info( "Done loading. Loaded a total of {i} rows".format(**locals())) -@staticmethod -def _serialize_cell(cell): -if isinstance(cell, basestring): -return "'" + str(cell).replace("'", "''") + "'" -elif cell is None: -return 'NULL' -elif isinstance(cell, numpy.datetime64): -return "'" + str(cell) + "'" -elif isinstance(cell, datetime): -return "'" + cell.isoformat() + "'" -else: -return str(cell) - def bulk_dump(self, table, tmp_file): """ Dumps a database table into a tab-delimited file
[jira] [Closed] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-183. --- Resolution: Fixed Fix Version/s: Airflow 1.8 > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Assignee: Yap Sok Ann >Priority: Minor > Fix For: Airflow 1.8 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304403#comment-15304403 ] Chris Riccomini commented on AIRFLOW-183: - +1 merged > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Assignee: Yap Sok Ann >Priority: Minor > Fix For: Airflow 1.8 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304400#comment-15304400 ] ASF subversion and git services commented on AIRFLOW-183: - Commit 4aac54d64437ad4aae3020de6debabbc9e911709 in incubator-airflow's branch refs/heads/master from [~sayap] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4aac54d ] [AIRFLOW-183] Fetch log from remote when worker returns 4xx/5xx response Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-183 This is mainly to make the behavior consistent when some log files have been deleted from the log folder. Without the change, the remote s3/gcs fallback will only trigger if the task ran on the local worker. Author: Yap Sok Ann Closes #1551 from sayap/remote-log-remote-worker. > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Assignee: Yap Sok Ann >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-183] Fetch log from remote when worker returns 4xx/5xx response
Repository: incubator-airflow Updated Branches: refs/heads/master afcd4fcf0 -> 4aac54d64 [AIRFLOW-183] Fetch log from remote when worker returns 4xx/5xx response Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-183 This is mainly to make the behavior consistent when some log files have been deleted from the log folder. Without the change, the remote s3/gcs fallback will only trigger if the task ran on the local worker. Author: Yap Sok Ann Closes #1551 from sayap/remote-log-remote-worker. Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4aac54d6 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4aac54d6 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4aac54d6 Branch: refs/heads/master Commit: 4aac54d64437ad4aae3020de6debabbc9e911709 Parents: afcd4fc Author: Yap Sok Ann Authored: Fri May 27 10:25:25 2016 -0700 Committer: Chris Riccomini Committed: Fri May 27 10:25:25 2016 -0700 -- airflow/www/views.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4aac54d6/airflow/www/views.py -- diff --git a/airflow/www/views.py b/airflow/www/views.py index 78f9677..bba417b 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -856,7 +856,9 @@ class Airflow(BaseView): log += "*** Fetching here: {url}\n".format(**locals()) try: import requests -log += '\n' + requests.get(url).text +response = requests.get(url) +response.raise_for_status() +log += '\n' + response.text log_loaded = True except: log += "*** Failed to fetch log file from worker.\n".format(
[jira] [Commented] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304401#comment-15304401 ] ASF subversion and git services commented on AIRFLOW-183: - Commit 4aac54d64437ad4aae3020de6debabbc9e911709 in incubator-airflow's branch refs/heads/master from [~sayap] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4aac54d ] [AIRFLOW-183] Fetch log from remote when worker returns 4xx/5xx response Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-183 This is mainly to make the behavior consistent when some log files have been deleted from the log folder. Without the change, the remote s3/gcs fallback will only trigger if the task ran on the local worker. Author: Yap Sok Ann Closes #1551 from sayap/remote-log-remote-worker. > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Assignee: Yap Sok Ann >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-185) Fix bug when no branches can be discovered automatically
Jeremiah Lowin created AIRFLOW-185: -- Summary: Fix bug when no branches can be discovered automatically Key: AIRFLOW-185 URL: https://issues.apache.org/jira/browse/AIRFLOW-185 Project: Apache Airflow Issue Type: Bug Components: PR tool Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin Priority: Minor A function is called on the list of version branches which expects there to be at least one version available. The function call should be skipped if the list is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-177) Resume a failed dag
[ https://issues.apache.org/jira/browse/AIRFLOW-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-177. --- Resolution: Information Provided I'm closing, and I've opened AIRFLOW-184 to define the real ask. > Resume a failed dag > --- > > Key: AIRFLOW-177 > URL: https://issues.apache.org/jira/browse/AIRFLOW-177 > Project: Apache Airflow > Issue Type: New Feature > Components: core >Reporter: Sumit Maheshwari > > Say I've a dag with 10 nodes and one of the dag run got failed at 5th node. > Now if I want to resume that dag, I can go and run individual task one by > one. Is there any way by which I can just tell dag_id and execution_date (or > run_id) and it automatically retries only failed tasks? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-184) Add clear/mark success to CLI
Chris Riccomini created AIRFLOW-184: --- Summary: Add clear/mark success to CLI Key: AIRFLOW-184 URL: https://issues.apache.org/jira/browse/AIRFLOW-184 Project: Apache Airflow Issue Type: Bug Components: cli Reporter: Chris Riccomini AIRFLOW-177 pointed out that the current CLI does not allow us to clear or mark success a task (including upstream, downstream, past, future, and recursive) the way that the UI widget does. Given a goal of keeping parity between the UI and CLI, it seems like we should support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-177) Resume a failed dag
[ https://issues.apache.org/jira/browse/AIRFLOW-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304355#comment-15304355 ] Chris Riccomini commented on AIRFLOW-177: - What might be more useful is to actually expose all functionality that the UI widget does (clear/mark success/etc for upstream, downstream, past, future). This is in keeping with [~maxime.beauche...@apache.org]'s goal of having parity between the UI and CLI. If you want to send a PR, that'd be great. > Resume a failed dag > --- > > Key: AIRFLOW-177 > URL: https://issues.apache.org/jira/browse/AIRFLOW-177 > Project: Apache Airflow > Issue Type: New Feature > Components: core >Reporter: Sumit Maheshwari > > Say I've a dag with 10 nodes and one of the dag run got failed at 5th node. > Now if I want to resume that dag, I can go and run individual task one by > one. Is there any way by which I can just tell dag_id and execution_date (or > run_id) and it automatically retries only failed tasks? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-183: Assignee: Yap Sok Ann > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Assignee: Yap Sok Ann >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-183) webserver not retrieving from remote s3/gcs if a log file has been deleted from a remote worker
[ https://issues.apache.org/jira/browse/AIRFLOW-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-183: External issue URL: https://github.com/apache/incubator-airflow/pull/1551 > webserver not retrieving from remote s3/gcs if a log file has been deleted > from a remote worker > --- > > Key: AIRFLOW-183 > URL: https://issues.apache.org/jira/browse/AIRFLOW-183 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 >Reporter: Yap Sok Ann >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-91) Ssl gunicorn
[ https://issues.apache.org/jira/browse/AIRFLOW-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304064#comment-15304064 ] Stanilovsky Evgeny commented on AIRFLOW-91: --- https://github.com/apache/incubator-airflow/pull/1497#issuecomment-220943087 one more fix > Ssl gunicorn > > > Key: AIRFLOW-91 > URL: https://issues.apache.org/jira/browse/AIRFLOW-91 > Project: Apache Airflow > Issue Type: Improvement > Components: security >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny > > old issue : https://github.com/apache/incubator-airflow/pull/1492 > Ssl gunicorn support -- This message was sent by Atlassian JIRA (v6.3.4#6332)