[jira] [Updated] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()
[ https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-3388: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks Rongrong! > Improve Performance of UDF PERCENTILE_APPROX() > -- > > Key: HIVE-3388 > URL: https://issues.apache.org/jira/browse/HIVE-3388 > Project: Hive > Issue Type: Task >Reporter: Rongrong Zhong >Assignee: Rongrong Zhong >Priority: Minor > Attachments: HIVE-3388.1.patch.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()
[ https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450192#comment-13450192 ] Siying Dong commented on HIVE-3388: --- +1 > Improve Performance of UDF PERCENTILE_APPROX() > -- > > Key: HIVE-3388 > URL: https://issues.apache.org/jira/browse/HIVE-3388 > Project: Hive > Issue Type: Task >Reporter: Rongrong Zhong >Assignee: Rongrong Zhong >Priority: Minor > Attachments: HIVE-3388.1.patch.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2247. --- Resolution: Fixed I committed the patch 7 months ago. Forgot to resolve it. Thanks Weiyan! > ALTER TABLE RENAME PARTITION > > > Key: HIVE-2247 > URL: https://issues.apache.org/jira/browse/HIVE-2247 > Project: Hive > Issue Type: New Feature >Reporter: Siying Dong >Assignee: Weiyan Wang > Attachments: HIVE-2247.10.patch.txt, HIVE-2247.11.patch.txt, > HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, > HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt, > HIVE-2247.9.patch.txt, HIVE-2247.9.patch.txt > > > We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER > TABLE RENAME. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-3030. --- Resolution: Fixed > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281137#comment-13281137 ] Siying Dong commented on HIVE-3030: --- Committed. Thanks Namit! > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280470#comment-13280470 ] Siying Dong commented on HIVE-3030: --- Tests look good to me. Will run the test suites. Let's open a follow-up JIRA to escape a more complete list of characters. > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280399#comment-13280399 ] Siying Dong commented on HIVE-3030: --- Discussed with Namit offline. He is going to add one more test case now. > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280294#comment-13280294 ] Siying Dong commented on HIVE-3030: --- Logic looks good to me. I'll run unit tests now. In the mean time, can you add tests to cover those new cases? Cases like escaping '\', and unescaping cases like '\\', ,'\\\t' or '\\\t'? > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278020#comment-13278020 ] Siying Dong commented on HIVE-3030: --- I meaned "Maybe not for this patch but as a follow-up, we might want to escape too to keep the escaping mapping a complete one." > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278018#comment-13278018 ] Siying Dong commented on HIVE-3030: --- Here is a general problem (maybe not related to new change to this patch): there is no way to output "\n" back to Hive. It will be translated to a . In a similar way, if the column contains "\n", it will not be escaped so the transform script will have no way to distinguish this from a new line. With this patch, more cases like this will be added. Maybe not for this patch but as a follow-up, we might want to escape \\ too to keep the escaping mapping a complete one. Other than that, the patch looks good to me. > escape more chars for script operator > - > > Key: HIVE-3030 > URL: https://issues.apache.org/jira/browse/HIVE-3030 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Only new line was being escaped. > The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Patch Available (was: Open) > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.3.patch Reran all test suites and fixed another several wrong test results. > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.2.patch Changed an assert issue and recover the some test result files which were changed incorrectly by HIVE-1538. > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Open (was: Patch Available) There's a bug. > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.1.patch Fix the problem by considering sample filter operator can be the first filter operator after table scan. > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Patch Available (was: Open) > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Example: select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2360) create dynamic partition if and only if intermediate source has files
[ https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2360: - Assignee: (was: Franklin Hu) > create dynamic partition if and only if intermediate source has files > - > > Key: HIVE-2360 > URL: https://issues.apache.org/jira/browse/HIVE-2360 > Project: Hive > Issue Type: Bug >Reporter: Franklin Hu >Priority: Minor > Fix For: 0.8.0 > > Attachments: hive-2360.1.patch, hive-2360.2.patch > > > There are some conditions under which a partition description is created due > to insert overwriting a table using dynamic partitioning for partitions that > that are empty (have no files). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2360) create dynamic partition if and only if intermediate source has files
[ https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103145#comment-13103145 ] Siying Dong commented on HIVE-2360: --- Franklin finished his internship and left. We should find another one to finish the task. > create dynamic partition if and only if intermediate source has files > - > > Key: HIVE-2360 > URL: https://issues.apache.org/jira/browse/HIVE-2360 > Project: Hive > Issue Type: Bug >Reporter: Franklin Hu >Assignee: Franklin Hu >Priority: Minor > Fix For: 0.8.0 > > Attachments: hive-2360.1.patch, hive-2360.2.patch > > > There are some conditions under which a partition description is created due > to insert overwriting a table using dynamic partitioning for partitions that > that are empty (have no files). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.
[ https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2378. --- Resolution: Fixed Committed. Thanks Kevin! > Warn user that precision is lost when bigint is implicitly cast to double. > -- > > Key: HIVE-2378 > URL: https://issues.apache.org/jira/browse/HIVE-2378 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, > HIVE-2378.3.patch.txt > > > When a bigint is implicitly cast to a double (when a bigint is involved in an > equality expression with a string or double) precision may be lost, resulting > in unexpected behavior. Until we fix the underlying issue we should throw an > error in strict mode, and a warning in nonstrict mode alerting the user about > this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.
[ https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093991#comment-13093991 ] Siying Dong commented on HIVE-2378: --- +1, will commit if unit tests pass. > Warn user that precision is lost when bigint is implicitly cast to double. > -- > > Key: HIVE-2378 > URL: https://issues.apache.org/jira/browse/HIVE-2378 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, > HIVE-2378.3.patch.txt > > > When a bigint is implicitly cast to a double (when a bigint is involved in an > equality expression with a string or double) precision may be lost, resulting > in unexpected behavior. Until we fix the underlying issue we should throw an > error in strict mode, and a warning in nonstrict mode alerting the user about > this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091305#comment-13091305 ] Siying Dong commented on HIVE-2385: --- @Carl, are you still seeing tests failing? > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2352: -- Assignee: (was: Franklin Hu) > create empty files if and only if table is bucketed and > hive.enforce.bucketing=true > --- > > Key: HIVE-2352 > URL: https://issues.apache.org/jira/browse/HIVE-2352 > Project: Hive > Issue Type: Improvement >Reporter: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch > > > create table t1 (key int, value string) stored as rcfile; > insert overwrite table t1 select * from src where false; > Creates an empty RCFile with no rows and size 151B. The file not should be > created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090399#comment-13090399 ] Siying Dong commented on HIVE-2352: --- I ran tests twice. Both crashed. I think it is an important patch and will improve latency of some queries (like scanning a large dataset for one or two rows) dramatically (Currently I sometimes do a "ORDER BY LIMIT BY" to speed it up if I know the data set is small). We should raise the priority. > create empty files if and only if table is bucketed and > hive.enforce.bucketing=true > --- > > Key: HIVE-2352 > URL: https://issues.apache.org/jira/browse/HIVE-2352 > Project: Hive > Issue Type: Bug >Reporter: Franklin Hu >Assignee: Franklin Hu >Priority: Minor > Fix For: 0.8.0 > > Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch > > > create table t1 (key int, value string) stored as rcfile; > insert overwrite table t1 select * from src where false; > Creates an empty RCFile with no rows and size 151B. The file not should be > created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2352: -- Priority: Major (was: Minor) Issue Type: Improvement (was: Bug) > create empty files if and only if table is bucketed and > hive.enforce.bucketing=true > --- > > Key: HIVE-2352 > URL: https://issues.apache.org/jira/browse/HIVE-2352 > Project: Hive > Issue Type: Improvement >Reporter: Franklin Hu >Assignee: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch > > > create table t1 (key int, value string) stored as rcfile; > insert overwrite table t1 select * from src where false; > Creates an empty RCFile with no rows and size 151B. The file not should be > created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090397#comment-13090397 ] Siying Dong commented on HIVE-2385: --- It passed all the tests. > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Status: Patch Available (was: Open) > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Attachment: HIVE-2385.2.patch Fix the bug and it passes autolocal1.q. I'm running the whole test suites now. > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089575#comment-13089575 ] Siying Dong commented on HIVE-2352: --- Franklin's internship ended. Let me apply his patch and see whether there is any failed tests. > create empty files if and only if table is bucketed and > hive.enforce.bucketing=true > --- > > Key: HIVE-2352 > URL: https://issues.apache.org/jira/browse/HIVE-2352 > Project: Hive > Issue Type: Bug >Reporter: Franklin Hu >Assignee: Franklin Hu >Priority: Minor > Fix For: 0.8.0 > > Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch > > > create table t1 (key int, value string) stored as rcfile; > insert overwrite table t1 select * from src where false; > Creates an empty RCFile with no rows and size 151B. The file not should be > created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086539#comment-13086539 ] Siying Dong commented on HIVE-2385: --- I don't know why but I can't create review board using this patch. > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2385: - Assignee: Siying Dong > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Status: Patch Available (was: Open) > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Attachment: HIVE-2385.1.patch Further estimate input for LIMIT when deciding local mode. Also fix a bug (won't cause wrong result) of the LIMIT optimization. > Local Mode can be more aggressive if LIMIT optimization is on > - > > Key: HIVE-2385 > URL: https://issues.apache.org/jira/browse/HIVE-2385 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Priority: Minor > Attachments: HIVE-2385.1.patch > > > Local mode now depends on total input data, but for LIMIT queries with no > filtering, the data actually scanned can be much less and it's relatively > predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Priority: Minor Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2272: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks Franklin! > add TIMESTAMP data type > --- > > Key: HIVE-2272 > URL: https://issues.apache.org/jira/browse/HIVE-2272 > Project: Hive > Issue Type: New Feature >Reporter: Franklin Hu >Assignee: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2272.1.patch, hive-2272.10.patch, > hive-2272.11.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, > hive-2272.5.patch, hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, > hive-2272.9.patch > > > Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 > 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision > using both LazyBinary and LazySimple SerDes. > For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp > parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2282. --- Resolution: Fixed Committed. Thanks Kevin! > Local mode needs to work well with block sampling > - > > Key: HIVE-2282 > URL: https://issues.apache.org/jira/browse/HIVE-2282 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Kevin Wilfong > Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, > HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt > > > Currently, if block sampling is enabled and large set of data are sampled to > a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082007#comment-13082007 ] Siying Dong commented on HIVE-2272: --- +1, please open a follow up JIRA for setting timezones. > add TIMESTAMP data type > --- > > Key: HIVE-2272 > URL: https://issues.apache.org/jira/browse/HIVE-2272 > Project: Hive > Issue Type: New Feature >Reporter: Franklin Hu >Assignee: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2272.1.patch, hive-2272.10.patch, > hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, > hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, hive-2272.9.patch > > > Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 > 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision > using both LazyBinary and LazySimple SerDes. > For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp > parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2309: -- Resolution: Fixed Status: Resolved (was: Patch Available) commited. Thanks Paul! > Incorrect regular expression for extracting task id from filename > - > > Key: HIVE-2309 > URL: https://issues.apache.org/jira/browse/HIVE-2309 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Paul Yang >Assignee: Paul Yang >Priority: Minor > Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch > > > For producing the correct filenames for bucketed tables, there is a method in > Utilities.java that extracts out the task id from the filename and replaces > it with the bucket number. There is a bug in the regex that is used to > extract this value for attempt numbers >= 10: > {code} > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_10').group(1) > '10' > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_9').group(1) > '001210' > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Summary: Comparison Operators convert number types to common type instead of double if possible (was: Comparison Operators convert number types to common type instead of double if necessary) > Comparison Operators convert number types to common type instead of double if > possible > -- > > Key: HIVE-2248 > URL: https://issues.apache.org/jira/browse/HIVE-2248 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Siying Dong >Assignee: Siying Dong > Fix For: 0.8.0 > > Attachments: HIVE-2248.1.patch > > > Now if the two sides of comparison is of different type, we always convert > both to double and compare. It was a slight regression from the change in > https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, > using GenericUDFBridge, always tried to find common type first. > The worse case is this: If you did "WHERE = 0 ", we always > convert the column and 0 to double and compare, which is wasteful, though it > is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071420#comment-13071420 ] Siying Dong commented on HIVE-2309: --- +1, will commit after tests pass > Incorrect regular expression for extracting task id from filename > - > > Key: HIVE-2309 > URL: https://issues.apache.org/jira/browse/HIVE-2309 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Paul Yang >Assignee: Paul Yang >Priority: Minor > Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch > > > For producing the correct filenames for bucketed tables, there is a method in > Utilities.java that extracts out the task id from the filename and replaces > it with the bucket number. There is a bug in the regex that is used to > extract this value for attempt numbers >= 10: > {code} > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_10').group(1) > '10' > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_9').group(1) > '001210' > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071409#comment-13071409 ] Siying Dong commented on HIVE-2309: --- can we limit number of digits for the attempt ID? > Incorrect regular expression for extracting task id from filename > - > > Key: HIVE-2309 > URL: https://issues.apache.org/jira/browse/HIVE-2309 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Paul Yang >Assignee: Paul Yang >Priority: Minor > Attachments: HIVE-2309.1.patch > > > For producing the correct filenames for bucketed tables, there is a method in > Utilities.java that extracts out the task id from the filename and replaces > it with the bucket number. There is a bug in the regex that is used to > extract this value for attempt numbers >= 10: > {code} > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_10').group(1) > '10' > >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", > >>> 'attempt_201107090429_6496​5_m_001210_9').group(1) > '001210' > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070865#comment-13070865 ] Siying Dong commented on HIVE-2282: --- I don't know why but I ran the test suites twice and both failed. Can you rebase your codes and try to run the whole test suites and see whether all the tests pass? I'll try again too. > Local mode needs to work well with block sampling > - > > Key: HIVE-2282 > URL: https://issues.apache.org/jira/browse/HIVE-2282 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Kevin Wilfong > Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, > HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt > > > Currently, if block sampling is enabled and large set of data are sampled to > a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
[ https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070811#comment-13070811 ] Siying Dong commented on HIVE-2249: --- Joseph, can you handle the string case too? > When creating constant expression for numbers, try to infer type from another > comparison operand, instead of trying to use integer first, and then long and > double > -- > > Key: HIVE-2249 > URL: https://issues.apache.org/jira/browse/HIVE-2249 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Joseph Barillari > Attachments: HIVE-2249.1.patch.txt > > > The current code to build constant expression for numbers, here is the code: > try { > v = Double.valueOf(expr.getText()); > v = Long.valueOf(expr.getText()); > v = Integer.valueOf(expr.getText()); > } catch (NumberFormatException e) { > // do nothing here, we will throw an exception in the following block > } > if (v == null) { > throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT > .getMsg(expr)); > } > return new ExprNodeConstantDesc(v); > The for the case that "WHERE = 0", or "WHERE > = 0", we always have to do a type conversion when comparing, which is > unnecessary if it is slightly smarter to choose type when creating the > constant expression. We can simply walk one level up the tree, find another > comparison party and use the same type with that one if it is possible. For > user's wrong query like '=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.4.patch > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch, > HIVE-2236.4.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
[ https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2249: - Assignee: Joseph Barillari > When creating constant expression for numbers, try to infer type from another > comparison operand, instead of trying to use integer first, and then long and > double > -- > > Key: HIVE-2249 > URL: https://issues.apache.org/jira/browse/HIVE-2249 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Joseph Barillari > Attachments: HIVE-2249.1.patch.txt > > > The current code to build constant expression for numbers, here is the code: > try { > v = Double.valueOf(expr.getText()); > v = Long.valueOf(expr.getText()); > v = Integer.valueOf(expr.getText()); > } catch (NumberFormatException e) { > // do nothing here, we will throw an exception in the following block > } > if (v == null) { > throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT > .getMsg(expr)); > } > return new ExprNodeConstantDesc(v); > The for the case that "WHERE = 0", or "WHERE > = 0", we always have to do a type conversion when comparing, which is > unnecessary if it is slightly smarter to choose type when creating the > constant expression. We can simply walk one level up the tree, find another > comparison party and use the same type with that one if it is possible. For > user's wrong query like '=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2296: -- Resolution: Fixed Status: Resolved (was: Patch Available) committed. Thanks Franklin! > bad compressed file names from insert into > -- > > Key: HIVE-2296 > URL: https://issues.apache.org/jira/browse/HIVE-2296 > Project: Hive > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Franklin Hu >Assignee: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2296.1.patch, hive-2296.2.patch > > > When INSERT INTO is run on a table with compressed output > (hive.exec.compress.output=true) and existing files in the table, it may copy > the new files in bad file names: > Before INSERT INTO: > 00_0.gz > After INSERT INTO: > 00_0.gz > 00_0.gz_copy_1 > This causes corrupted output when doing a SELECT * on the table. > Correct behavior should be to pick a valid filename such as: > 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069611#comment-13069611 ] Siying Dong commented on HIVE-2282: --- Also, query like "select key, value from sih_src tablesample(1 percent)" actually doesn't generate stable result. You can use select count(1) instead. That will generate correct results. > Local mode needs to work well with block sampling > - > > Key: HIVE-2282 > URL: https://issues.apache.org/jira/browse/HIVE-2282 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Kevin Wilfong > Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, > HIVE-2282.3.patch.txt > > > Currently, if block sampling is enabled and large set of data are sampled to > a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069610#comment-13069610 ] Siying Dong commented on HIVE-2282: --- Kevin, you forgot to add file ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out to the patch. > Local mode needs to work well with block sampling > - > > Key: HIVE-2282 > URL: https://issues.apache.org/jira/browse/HIVE-2282 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Kevin Wilfong > Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, > HIVE-2282.3.patch.txt > > > Currently, if block sampling is enabled and large set of data are sampled to > a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069314#comment-13069314 ] Siying Dong commented on HIVE-2296: --- +1 > bad compressed file names from insert into > -- > > Key: HIVE-2296 > URL: https://issues.apache.org/jira/browse/HIVE-2296 > Project: Hive > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Franklin Hu >Assignee: Franklin Hu > Fix For: 0.8.0 > > Attachments: hive-2296.1.patch, hive-2296.2.patch > > > When INSERT INTO is run on a table with compressed output > (hive.exec.compress.output=true) and existing files in the table, it may copy > the new files in bad file names: > Before INSERT INTO: > 00_0.gz > After INSERT INTO: > 00_0.gz > 00_0.gz_copy_1 > This causes corrupted output when doing a SELECT * on the table. > Correct behavior should be to pick a valid filename such as: > 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069126#comment-13069126 ] Siying Dong commented on HIVE-2247: --- I'm looking at the patch. Please test the backward compatible between the old server, new client and new server, old client. Please come by if you don't know how to test it. > ALTER TABLE RENAME PARTITION > > > Key: HIVE-2247 > URL: https://issues.apache.org/jira/browse/HIVE-2247 > Project: Hive > Issue Type: New Feature >Reporter: Siying Dong >Assignee: Weiyan Wang > Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, > HIVE-2247.5.patch.txt > > > We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER > TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.3.patch fix a bug > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Open (was: Patch Available) > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.4.patch 1. change block merge task too 2. change the capital file name > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, > HIVE-2201.4.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.2.patch remove the MapRedStat list from DriverContext and add more counters. > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066299#comment-13066299 ] Siying Dong commented on HIVE-2282: --- +1, will commit after testing. > Local mode needs to work well with block sampling > - > > Key: HIVE-2282 > URL: https://issues.apache.org/jira/browse/HIVE-2282 > Project: Hive > Issue Type: Improvement >Reporter: Siying Dong >Assignee: Kevin Wilfong > Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, > HIVE-2282.3.patch.txt > > > Currently, if block sampling is enabled and large set of data are sampled to > a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064994#comment-13064994 ] Siying Dong commented on HIVE-2247: --- Sorry for the confusion. I just meaned to change the directory name where the data is, and change the "location" parameter in the partition metadata. If we decide not to change physical path, we just change partition name. If we need to change the physical path, then we need to change partition name and location. > ALTER TABLE RENAME PARTITION > > > Key: HIVE-2247 > URL: https://issues.apache.org/jira/browse/HIVE-2247 > Project: Hive > Issue Type: New Feature >Reporter: Siying Dong >Assignee: Weiyan Wang > Attachments: HIVE-2247.3.patch.txt > > > We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER > TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064935#comment-13064935 ] Siying Dong commented on HIVE-2272: --- Can you add it to review board? > add TIMESTAMP data type > --- > > Key: HIVE-2272 > URL: https://issues.apache.org/jira/browse/HIVE-2272 > Project: Hive > Issue Type: New Feature >Reporter: Franklin Hu >Assignee: Franklin Hu > Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch > > > Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 > 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision > using both LazyBinary and LazySimple SerDes. > For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp > parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2282) Local mode needs to work well with block sampling
Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2247: - Assignee: Weiyan Wang > ALTER TABLE RENAME PARTITION > > > Key: HIVE-2247 > URL: https://issues.apache.org/jira/browse/HIVE-2247 > Project: Hive > Issue Type: New Feature >Reporter: Siying Dong >Assignee: Weiyan Wang > > We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER > TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-306) Support "INSERT [INTO] destination"
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-306: - Status: Patch Available (was: Open) > Support "INSERT [INTO] destination" > --- > > Key: HIVE-306 > URL: https://issues.apache.org/jira/browse/HIVE-306 > Project: Hive > Issue Type: New Feature >Reporter: Zheng Shao >Assignee: Franklin Hu > Attachments: hive-306.1.patch, hive-306.2.patch, hive-306.3.patch, > hive-306.4.patch > > > Currently hive only supports "INSERT OVERWRITE destination". We should > support "INSERT [INTO] destination". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1721) use bloom filters to improve the performance of joins
[ https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063501#comment-13063501 ] Siying Dong commented on HIVE-1721: --- Andrew, what do you mean by "the filter could be built in parallel with an MR job"? Our initial plan was to only build filter based on smaller tables and apply the filter against the big table to reduce data to be shuffled. For the syntax, the plan is to use syntax like MAPJOIN. We can do something like SELECT /*+ BLOOMFILTER(t1) +*/ ... FROM t1 JOIN t2 ... > use bloom filters to improve the performance of joins > - > > Key: HIVE-1721 > URL: https://issues.apache.org/jira/browse/HIVE-1721 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: J. Andrew Key > Labels: optimization > > In case of map-joins, it is likely that the big table will not find many > matching rows from the small table. > Currently, we perform a hash-map lookup for every row in the big table, which > can be pretty expensive. > It might be useful to try out a bloom-filter containing all the elements in > the small table. > Each element from the big table is first searched in the bloom filter, and > only in case of a positive match, > the small table hash table is explored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063457#comment-13063457 ] Siying Dong commented on HIVE-2247: --- The use case of use is that we want to have sanity check for the quality of the data in a temp partition name before we move the data to the partition that people consider that the partition is ready. We want to avoid data scanning for this operation. > ALTER TABLE RENAME PARTITION > > > Key: HIVE-2247 > URL: https://issues.apache.org/jira/browse/HIVE-2247 > Project: Hive > Issue Type: New Feature >Reporter: Siying Dong > > We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER > TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-306) Support "INSERT [INTO] destination"
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058206#comment-13058206 ] Siying Dong commented on HIVE-306: -- Test breaks: TestParseNegative > Support "INSERT [INTO] destination" > --- > > Key: HIVE-306 > URL: https://issues.apache.org/jira/browse/HIVE-306 > Project: Hive > Issue Type: New Feature >Reporter: Zheng Shao >Assignee: Franklin Hu > Attachments: hive-306.1.patch, hive-306.2.patch > > > Currently hive only supports "INSERT OVERWRITE destination". We should > support "INSERT [INTO] destination". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-306) Support "INSERT [INTO] destination"
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058140#comment-13058140 ] Siying Dong commented on HIVE-306: -- +1. Looks good to me for now. I'm running tests. If it is committed, please open a follow-up JIRA for making moving files more efficient and compacting smaller files smarter for it. > Support "INSERT [INTO] destination" > --- > > Key: HIVE-306 > URL: https://issues.apache.org/jira/browse/HIVE-306 > Project: Hive > Issue Type: New Feature >Reporter: Zheng Shao >Assignee: Franklin Hu > Attachments: hive-306.1.patch, hive-306.2.patch > > > Currently hive only supports "INSERT OVERWRITE destination". We should > support "INSERT [INTO] destination". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double -- Key: HIVE-2249 URL: https://issues.apache.org/jira/browse/HIVE-2249 Project: Hive Issue Type: Improvement Reporter: Siying Dong The current code to build constant expression for numbers, here is the code: try { v = Double.valueOf(expr.getText()); v = Long.valueOf(expr.getText()); v = Integer.valueOf(expr.getText()); } catch (NumberFormatException e) { // do nothing here, we will throw an exception in the following block } if (v == null) { throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT .getMsg(expr)); } return new ExprNodeConstantDesc(v); The for the case that "WHERE = 0", or "WHERE = 0", we always have to do a type conversion when comparing, which is unnecessary if it is slightly smarter to choose type when creating the constant expression. We can simply walk one level up the tree, find another comparison party and use the same type with that one if it is possible. For user's wrong query like '=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Attachment: HIVE-2248.1.patch > Comparison Operators convert number types to common type instead of double if > necessary > --- > > Key: HIVE-2248 > URL: https://issues.apache.org/jira/browse/HIVE-2248 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2248.1.patch > > > Now if the two sides of comparison is of different type, we always convert > both to double and compare. It was a slight regression from the change in > https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, > using GenericUDFBridge, always tried to find common type first. > The worse case is this: If you did "WHERE = 0 ", we always > convert the column and 0 to double and compare, which is wasteful, though it > is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Status: Patch Available (was: Open) > Comparison Operators convert number types to common type instead of double if > necessary > --- > > Key: HIVE-2248 > URL: https://issues.apache.org/jira/browse/HIVE-2248 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2248.1.patch > > > Now if the two sides of comparison is of different type, we always convert > both to double and compare. It was a slight regression from the change in > https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, > using GenericUDFBridge, always tried to find common type first. > The worse case is this: If you did "WHERE = 0 ", we always > convert the column and 0 to double and compare, which is wasteful, though it > is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Description: Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did "WHERE = 0 ", we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. was:Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, using GenericUDFBridge, always tried to find common type first. > Comparison Operators convert number types to common type instead of double if > necessary > --- > > Key: HIVE-2248 > URL: https://issues.apache.org/jira/browse/HIVE-2248 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > > Now if the two sides of comparison is of different type, we always convert > both to double and compare. It was a slight regression from the change in > https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, > using GenericUDFBridge, always tried to find common type first. > The worse case is this: If you did "WHERE = 0 ", we always > convert the column and 0 to double and compare, which is wasteful, though it > is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, using GenericUDFBridge, always tried to find common type first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2247) CREATE TABLE RENAME PARTITION
CREATE TABLE RENAME PARTITION - Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056205#comment-13056205 ] Siying Dong commented on HIVE-2035: --- committed > Use block-level merge for RCFile if merging intermediate results are needed > --- > > Key: HIVE-2035 > URL: https://issues.apache.org/jira/browse/HIVE-2035 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Franklin Hu > Attachments: hive-2035.1.patch, hive-2035.3.patch > > > Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true > the intermediate data could be merged using an additional MapReduce job. This > could be quite expensive if the data size is large. With HIVE-1950, merging > can be done in the RCFile block level so that it bypasses the > (de-)compression, (de-)serialization phases. This could improve the merge > process significantly. > This JIRA should handle the case where the input table is not stored in > RCFile, but the destination table is (which requires the intermediate data > should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2035: -- Status: Patch Available (was: Open) > Use block-level merge for RCFile if merging intermediate results are needed > --- > > Key: HIVE-2035 > URL: https://issues.apache.org/jira/browse/HIVE-2035 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Franklin Hu > Attachments: hive-2035.1.patch, hive-2035.3.patch > > > Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true > the intermediate data could be merged using an additional MapReduce job. This > could be quite expensive if the data size is large. With HIVE-1950, merging > can be done in the RCFile block level so that it bypasses the > (de-)compression, (de-)serialization phases. This could improve the merge > process significantly. > This JIRA should handle the case where the input table is not stored in > RCFile, but the destination table is (which requires the intermediate data > should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055355#comment-13055355 ] Siying Dong commented on HIVE-2035: --- +1, will run regression tests > Use block-level merge for RCFile if merging intermediate results are needed > --- > > Key: HIVE-2035 > URL: https://issues.apache.org/jira/browse/HIVE-2035 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Franklin Hu > Attachments: hive-2035.1.patch, hive-2035.3.patch > > > Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true > the intermediate data could be merged using an additional MapReduce job. This > could be quite expensive if the data size is large. With HIVE-1950, merging > can be done in the RCFile block level so that it bypasses the > (de-)compression, (de-)serialization phases. This could improve the merge > process significantly. > This JIRA should handle the case where the input table is not stored in > RCFile, but the destination table is (which requires the intermediate data > should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054595#comment-13054595 ] Siying Dong commented on HIVE-2201: --- Yongqiang: 1. As I commented previously "According to Hairong Kuang, Hadoop's behavior for creating a new file is that it will automatically create it's parent directory if it doesn't exist. In that case, I removed the directory check and create part when writing to a new file." 2. I tested the codes. I ran the whole regression tests and tested several cases manually in the cluster. I tried to kill some tasks manually 3. I'll see whether there are another dependency so that I can remove the old one. Having two reloaded calls are the convention we have in the file. All other similar calls have one function with Path call and one with String call. 4. The tree traversal logic is copied from localizeMRTmpFilesImpl(). The first look is to go through every operator tree. The second loop is to Breadth-First Search the operator tree to check any FileSyncOperator. 5. OK. I'll make the change. My understanding is that only FileSinkOperator and the BlockMerge file sink have the problem and the second one is going to have some large changes by HIVE-2035. Also BlockMerge file sink suffers the problem less as it runs faster that has less change to have incomplete results. > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054188#comment-13054188 ] Siying Dong commented on HIVE-2201: --- ping > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.1.patch > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > Attachments: HIVE-2236.1.patch > > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2236: - Assignee: Siying Dong > Cli: Print Hadoop's CPU milliseconds > > > Key: HIVE-2236 > URL: https://issues.apache.org/jira/browse/HIVE-2236 > Project: Hive > Issue Type: New Feature > Components: CLI >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Minor > > CPU Milliseonds information is available from Hadoop's framework. Printing it > out to Hive CLI when executing a job will help users to know more about their > jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Priority: Minor CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051415#comment-13051415 ] Siying Dong commented on HIVE-2035: --- will take a look. > Use block-level merge for RCFile if merging intermediate results are needed > --- > > Key: HIVE-2035 > URL: https://issues.apache.org/jira/browse/HIVE-2035 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Franklin Hu > Attachments: hive-2035.1.patch > > > Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true > the intermediate data could be merged using an additional MapReduce job. This > could be quite expensive if the data size is large. With HIVE-1950, merging > can be done in the RCFile block level so that it bypasses the > (de-)compression, (de-)serialization phases. This could improve the merge > process significantly. > This JIRA should handle the case where the input table is not stored in > RCFile, but the destination table is (which requires the intermediate data > should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: Patch Available (was: In Progress) > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: In Progress (was: Patch Available) > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.3.patch According to Hairong Kuang, Hadoop's behavior for creating a new file is that it will automatically create it's parent directory if it doesn't exist. In that case, I removed the directory check and create part when writing to a new file. > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.2.patch fix a bug. > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-2201 started by Siying Dong. > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: (was: HIVE-2201.1.patch) > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: Patch Available (was: In Progress) > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch Implemented the logic. Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need to check whether /tmp2 exists before moving it. This patch avoids this call by pre-create the temp directory before submitting the job. However, we cannot do that for dynamic partitioning as we don't know the directory names. So for dynamic partitioning, we have some extra costs added for DFS namenode read. So far I think this tradeoff is worthwhile. Potentially this cost can be reduced it by caching directories created. We can try that approach as a followup. > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-2201.1.patch > > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Assignee: Siying Dong Summary: reduce name node calls in hive by creating temporary directories (was: remove name node calls in hive by creating temporary directories) > reduce name node calls in hive by creating temporary directories > > > Key: HIVE-2201 > URL: https://issues.apache.org/jira/browse/HIVE-2201 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Siying Dong > > Currently, in Hive, when a file gets written by a FileSinkOperator, > the sequence of operations is as follows: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp1/1 > 3. Move directory /tmp1 to /tmp2 > 4. For all files in /tmp2, remove all files starting with _tmp and > duplicate files. > Due to speculative execution, a lot of temporary files are created > in /tmp1 (or /tmp2). This leads to a lot of name node calls, > specially for large queries. > The protocol above can be modified slightly: > 1. In tmp directory tmp1, create a tmp file _tmp_1 > 2. At the end of the operator, move > /tmp1/_tmp_1 to /tmp2/1 > 3. Move directory /tmp2 to /tmp3 > 4. For all files in /tmp3, remove all duplicate files. > This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Fix a bug caused by HIVE-243
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Summary: Fix a bug caused by HIVE-243 (was: Revert) > Fix a bug caused by HIVE-243 > > > Key: HIVE-2211 > URL: https://issues.apache.org/jira/browse/HIVE-2211 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong > Attachments: HIVE-2211.1.patch > > > Quick fix a bug caused by HIVE-243 > HIVE-234 removed the codes to wait for the threads to finish and use > ThreadPoolExector.shutdown() to wait for the results. The usage of > ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the > function blocks until all threads finish running but it actually only marks > status and won't block. It caused wrong result of Utilities.getInputSummary() > and caused many jobs are executed as local mode while they have huge data. > Revert those changes quickly. We can have a follow-up to see how to deal with > this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Revert
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Status: Patch Available (was: Open) > Revert > -- > > Key: HIVE-2211 > URL: https://issues.apache.org/jira/browse/HIVE-2211 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong > Attachments: HIVE-2211.1.patch > > > Quick fix a bug caused by HIVE-243 > HIVE-234 removed the codes to wait for the threads to finish and use > ThreadPoolExector.shutdown() to wait for the results. The usage of > ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the > function blocks until all threads finish running but it actually only marks > status and won't block. It caused wrong result of Utilities.getInputSummary() > and caused many jobs are executed as local mode while they have huge data. > Revert those changes quickly. We can have a follow-up to see how to deal with > this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Revert
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Attachment: HIVE-2211.1.patch Just a simple revert. I did a small modification: when catching InterruptedException, stop waiting pending threads and exit. > Revert > -- > > Key: HIVE-2211 > URL: https://issues.apache.org/jira/browse/HIVE-2211 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong > Attachments: HIVE-2211.1.patch > > > Quick fix a bug caused by HIVE-243 > HIVE-234 removed the codes to wait for the threads to finish and use > ThreadPoolExector.shutdown() to wait for the results. The usage of > ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the > function blocks until all threads finish running but it actually only marks > status and won't block. It caused wrong result of Utilities.getInputSummary() > and caused many jobs are executed as local mode while they have huge data. > Revert those changes quickly. We can have a follow-up to see how to deal with > this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2211) Revert
Revert -- Key: HIVE-2211 URL: https://issues.apache.org/jira/browse/HIVE-2211 Project: Hive Issue Type: Bug Reporter: Siying Dong Quick fix a bug caused by HIVE-243 HIVE-234 removed the codes to wait for the threads to finish and use ThreadPoolExector.shutdown() to wait for the results. The usage of ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the function blocks until all threads finish running but it actually only marks status and won't block. It caused wrong result of Utilities.getInputSummary() and caused many jobs are executed as local mode while they have huge data. Revert those changes quickly. We can have a follow-up to see how to deal with this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus
[ https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2186: -- Resolution: Fixed Release Note: Committed. Thanks Franklin. Status: Resolved (was: Patch Available) > Dynamic Partitioning Failing because of characters not supported globStatus > --- > > Key: HIVE-2186 > URL: https://issues.apache.org/jira/browse/HIVE-2186 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Siying Dong >Assignee: Franklin Hu > Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, > hive-2186.4.patch, hive-2186.5.patch > > > Some dynamic queries failed on the stage of loading partitions if dynamic > partition columns contain special characters. We need to escape all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose
[ https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2199: -- Resolution: Fixed Release Note: Committed. Thanks Franklin. Status: Resolved (was: Patch Available) > incorrect success flag passed to jobClose > - > > Key: HIVE-2199 > URL: https://issues.apache.org/jira/browse/HIVE-2199 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Franklin Hu >Assignee: Franklin Hu >Priority: Minor > Attachments: hive-2199.1.patch > > > For block level merging of RCFiles, jobClose is passed the incorrect variable > as the success flag -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus
[ https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043235#comment-13043235 ] Siying Dong commented on HIVE-2186: --- You need to show partition after dropping partition to make sure dropping partition succeeded. > Dynamic Partitioning Failing because of characters not supported globStatus > --- > > Key: HIVE-2186 > URL: https://issues.apache.org/jira/browse/HIVE-2186 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Siying Dong >Assignee: Franklin Hu > Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch > > > Some dynamic queries failed on the stage of loading partitions if dynamic > partition columns contain special characters. We need to escape all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus
[ https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041773#comment-13041773 ] Siying Dong commented on HIVE-2186: --- @Franklin, in your test case, can you also drop the partition ds=1 and show partitions again to make sure those partitions can be safely dropped? > Dynamic Partitioning Failing because of characters not supported globStatus > --- > > Key: HIVE-2186 > URL: https://issues.apache.org/jira/browse/HIVE-2186 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Siying Dong >Assignee: Franklin Hu > Attachments: hive-2186.1.patch, hive-2186.2.patch > > > Some dynamic queries failed on the stage of loading partitions if dynamic > partition columns contain special characters. We need to escape all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira