[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16896: -- Labels: (was: TODOC3.0) > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16896: -- Labels: TODOC3.0 (was: ) > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16896: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Patch pushed to master. > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: HIVE-16896.3.patch rebasing from master and resolving conflicts. > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: (was: HIVE-16896.3.patch) > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: (was: HIVE-16896.3.patch) > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: HIVE-16896.3.patch reattaching the patch so build can be triggered. > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch, HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: HIVE-16896.3.patch > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, > HIVE-16896.3.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: HIVE-16896.2.patch > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Status: Patch Available (was: In Progress) > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Attachment: HIVE-16896.1.patch > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > Attachments: HIVE-16896.1.patch > > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)