[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-09-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16896:
--
Labels:   (was: TODOC3.0)

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16896:
--
Labels: TODOC3.0  (was: )

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16896:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch pushed to master.

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-07 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: HIVE-16896.3.patch

rebasing from master and resolving conflicts.

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-07 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: (was: HIVE-16896.3.patch)

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-03 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: (was: HIVE-16896.3.patch)

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-03 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: HIVE-16896.3.patch

reattaching the patch so build can be triggered.

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch, HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-03 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: HIVE-16896.3.patch

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-08-02 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: HIVE-16896.2.patch

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-07-31 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Status: Patch Available  (was: In Progress)

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-07-31 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Attachment: HIVE-16896.1.patch

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16896.1.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)