[jira] [Updated] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()

2012-09-07 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-3388:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks Rongrong!

> Improve Performance of UDF PERCENTILE_APPROX()
> --
>
> Key: HIVE-3388
> URL: https://issues.apache.org/jira/browse/HIVE-3388
> Project: Hive
>  Issue Type: Task
>Reporter: Rongrong Zhong
>Assignee: Rongrong Zhong
>Priority: Minor
> Attachments: HIVE-3388.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()

2012-09-06 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450192#comment-13450192
 ] 

Siying Dong commented on HIVE-3388:
---

+1

> Improve Performance of UDF PERCENTILE_APPROX()
> --
>
> Key: HIVE-3388
> URL: https://issues.apache.org/jira/browse/HIVE-3388
> Project: Hive
>  Issue Type: Task
>Reporter: Rongrong Zhong
>Assignee: Rongrong Zhong
>Priority: Minor
> Attachments: HIVE-3388.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2247) ALTER TABLE RENAME PARTITION

2012-06-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2247.
---

Resolution: Fixed

I committed the patch 7 months ago. Forgot to resolve it. Thanks Weiyan!

> ALTER TABLE RENAME PARTITION
> 
>
> Key: HIVE-2247
> URL: https://issues.apache.org/jira/browse/HIVE-2247
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siying Dong
>Assignee: Weiyan Wang
> Attachments: HIVE-2247.10.patch.txt, HIVE-2247.11.patch.txt, 
> HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, 
> HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt, 
> HIVE-2247.9.patch.txt, HIVE-2247.9.patch.txt
>
>
> We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
> TABLE RENAME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-3030) escape more chars for script operator

2012-05-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-3030.
---

Resolution: Fixed

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-22 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281137#comment-13281137
 ] 

Siying Dong commented on HIVE-3030:
---

Committed. Thanks Namit!

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280470#comment-13280470
 ] 

Siying Dong commented on HIVE-3030:
---

Tests look good to me. Will run the test suites. Let's open a follow-up JIRA to 
escape a more complete list of characters.

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280399#comment-13280399
 ] 

Siying Dong commented on HIVE-3030:
---

Discussed with Namit offline. He is going to add one more test case now.

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280294#comment-13280294
 ] 

Siying Dong commented on HIVE-3030:
---

Logic looks good to me. I'll run unit tests now. In the mean time, can you add 
tests to cover those new cases? Cases like escaping '\', and unescaping cases 
like '\\', ,'\\\t' or '\\\t'?

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-17 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278020#comment-13278020
 ] 

Siying Dong commented on HIVE-3030:
---

I meaned "Maybe not for this patch but as a follow-up, we might want to escape 
 too to keep the escaping mapping a complete one."

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-17 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278018#comment-13278018
 ] 

Siying Dong commented on HIVE-3030:
---

Here is a general problem (maybe not related to new change to this patch): 
there is no way to output "\n" back to Hive. It will be translated to a 
. In a similar way, if the column contains "\n", it will not be 
escaped so the transform script will have no way to distinguish this from a new 
line. With this patch, more cases like this will be added. Maybe not for this 
patch but as a follow-up, we might want to escape \\ too to keep the escaping 
mapping a complete one.

Other than that, the patch looks good to me.

> escape more chars for script operator
> -
>
> Key: HIVE-3030
> URL: https://issues.apache.org/jira/browse/HIVE-3030
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Only new line was being escaped.
> The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-16 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Patch Available  (was: Open)

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-16 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.3.patch

Reran all test suites and fixed another several wrong test results.

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-16 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.2.patch

Changed an assert issue and recover the some test result files which were 
changed incorrectly by HIVE-1538.

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-16 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Open  (was: Patch Available)

There's a bug.

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.1.patch

Fix the problem by considering sample filter operator can be the first filter 
operator after table scan.

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Patch Available  (was: Open)

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)
TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
of HIVE-1538
--

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong


Example:

select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
 = 'xxx'

will not trigger input pruning.

The reason is that we assume sample filtering operator only happens as the 
second filter after table scan, which is broken by HIVE-1538, even if the 
feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2360) create dynamic partition if and only if intermediate source has files

2011-09-12 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2360:
-

Assignee: (was: Franklin Hu)

> create dynamic partition if and only if intermediate source has files
> -
>
> Key: HIVE-2360
> URL: https://issues.apache.org/jira/browse/HIVE-2360
> Project: Hive
>  Issue Type: Bug
>Reporter: Franklin Hu
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: hive-2360.1.patch, hive-2360.2.patch
>
>
> There are some conditions under which a partition description is created due 
> to insert overwriting a table using dynamic partitioning for partitions that 
> that are empty (have no files).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2360) create dynamic partition if and only if intermediate source has files

2011-09-12 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103145#comment-13103145
 ] 

Siying Dong commented on HIVE-2360:
---

Franklin finished his internship and left. We should find another one to finish 
the task.

> create dynamic partition if and only if intermediate source has files
> -
>
> Key: HIVE-2360
> URL: https://issues.apache.org/jira/browse/HIVE-2360
> Project: Hive
>  Issue Type: Bug
>Reporter: Franklin Hu
>Assignee: Franklin Hu
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: hive-2360.1.patch, hive-2360.2.patch
>
>
> There are some conditions under which a partition description is created due 
> to insert overwriting a table using dynamic partitioning for partitions that 
> that are empty (have no files).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.

2011-08-30 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2378.
---

Resolution: Fixed

Committed. Thanks Kevin!

> Warn user that precision is lost when bigint is implicitly cast to double.
> --
>
> Key: HIVE-2378
> URL: https://issues.apache.org/jira/browse/HIVE-2378
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, 
> HIVE-2378.3.patch.txt
>
>
> When a bigint is implicitly cast to a double (when a bigint is involved in an 
> equality expression with a string or double) precision may be lost, resulting 
> in unexpected behavior.  Until we fix the underlying issue we should throw an 
> error in strict mode, and a warning in nonstrict mode alerting the user about 
> this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.

2011-08-30 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093991#comment-13093991
 ] 

Siying Dong commented on HIVE-2378:
---

+1, will commit if unit tests pass.

> Warn user that precision is lost when bigint is implicitly cast to double.
> --
>
> Key: HIVE-2378
> URL: https://issues.apache.org/jira/browse/HIVE-2378
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, 
> HIVE-2378.3.patch.txt
>
>
> When a bigint is implicitly cast to a double (when a bigint is involved in an 
> equality expression with a string or double) precision may be lost, resulting 
> in unexpected behavior.  Until we fix the underlying issue we should throw an 
> error in strict mode, and a warning in nonstrict mode alerting the user about 
> this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-25 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091305#comment-13091305
 ] 

Siying Dong commented on HIVE-2385:
---

@Carl, are you still seeing tests failing?

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2352:
--

Assignee: (was: Franklin Hu)

> create empty files if and only if table is bucketed and 
> hive.enforce.bucketing=true
> ---
>
> Key: HIVE-2352
> URL: https://issues.apache.org/jira/browse/HIVE-2352
> Project: Hive
>  Issue Type: Improvement
>Reporter: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch
>
>
> create table t1 (key int, value string) stored as rcfile;
> insert overwrite table t1 select * from src where false;
> Creates an empty RCFile with no rows and size 151B. The file not should be 
> created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090399#comment-13090399
 ] 

Siying Dong commented on HIVE-2352:
---

I ran tests twice. Both crashed. I think it is an important patch and will 
improve latency of some queries (like scanning a large dataset for one or two 
rows) dramatically (Currently I sometimes do a "ORDER BY LIMIT BY" to speed it 
up if I know the data set is small). We should raise the priority.

> create empty files if and only if table is bucketed and 
> hive.enforce.bucketing=true
> ---
>
> Key: HIVE-2352
> URL: https://issues.apache.org/jira/browse/HIVE-2352
> Project: Hive
>  Issue Type: Bug
>Reporter: Franklin Hu
>Assignee: Franklin Hu
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch
>
>
> create table t1 (key int, value string) stored as rcfile;
> insert overwrite table t1 select * from src where false;
> Creates an empty RCFile with no rows and size 151B. The file not should be 
> created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2352:
--

  Priority: Major  (was: Minor)
Issue Type: Improvement  (was: Bug)

> create empty files if and only if table is bucketed and 
> hive.enforce.bucketing=true
> ---
>
> Key: HIVE-2352
> URL: https://issues.apache.org/jira/browse/HIVE-2352
> Project: Hive
>  Issue Type: Improvement
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch
>
>
> create table t1 (key int, value string) stored as rcfile;
> insert overwrite table t1 select * from src where false;
> Creates an empty RCFile with no rows and size 151B. The file not should be 
> created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-24 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090397#comment-13090397
 ] 

Siying Dong commented on HIVE-2385:
---

It passed all the tests.

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-23 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Status: Patch Available  (was: Open)

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-23 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Attachment: HIVE-2385.2.patch

Fix the bug and it passes autolocal1.q. I'm running the whole test suites now.

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-23 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089575#comment-13089575
 ] 

Siying Dong commented on HIVE-2352:
---

Franklin's internship ended. Let me apply his patch and see whether there is 
any failed tests.

> create empty files if and only if table is bucketed and 
> hive.enforce.bucketing=true
> ---
>
> Key: HIVE-2352
> URL: https://issues.apache.org/jira/browse/HIVE-2352
> Project: Hive
>  Issue Type: Bug
>Reporter: Franklin Hu
>Assignee: Franklin Hu
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch
>
>
> create table t1 (key int, value string) stored as rcfile;
> insert overwrite table t1 select * from src where false;
> Creates an empty RCFile with no rows and size 151B. The file not should be 
> created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086539#comment-13086539
 ] 

Siying Dong commented on HIVE-2385:
---

I don't know why but I can't create review board using this patch.

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2385:
-

Assignee: Siying Dong

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Status: Patch Available  (was: Open)

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Attachment: HIVE-2385.1.patch

Further estimate input for LIMIT when deciding local mode. Also fix a bug 
(won't cause wrong result) of the LIMIT optimization.

> Local Mode can be more aggressive if LIMIT optimization is on
> -
>
> Key: HIVE-2385
> URL: https://issues.apache.org/jira/browse/HIVE-2385
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Priority: Minor
> Attachments: HIVE-2385.1.patch
>
>
> Local mode now depends on total input data, but for LIMIT queries with no 
> filtering, the data actually scanned can be much less and it's relatively 
> predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)
Local Mode can be more aggressive if LIMIT optimization is on
-

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Priority: Minor


Local mode now depends on total input data, but for LIMIT queries with no 
filtering, the data actually scanned can be much less and it's relatively 
predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2272) add TIMESTAMP data type

2011-08-12 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2272:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Franklin!

> add TIMESTAMP data type
> ---
>
> Key: HIVE-2272
> URL: https://issues.apache.org/jira/browse/HIVE-2272
> Project: Hive
>  Issue Type: New Feature
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2272.1.patch, hive-2272.10.patch, 
> hive-2272.11.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, 
> hive-2272.5.patch, hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, 
> hive-2272.9.patch
>
>
> Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
> 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
> using both LazyBinary and LazySimple SerDes. 
> For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
> parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2282) Local mode needs to work well with block sampling

2011-08-12 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2282.
---

Resolution: Fixed

Committed. Thanks Kevin!

> Local mode needs to work well with block sampling
> -
>
> Key: HIVE-2282
> URL: https://issues.apache.org/jira/browse/HIVE-2282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
> HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt
>
>
> Currently, if block sampling is enabled and large set of data are sampled to 
> a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2272) add TIMESTAMP data type

2011-08-09 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082007#comment-13082007
 ] 

Siying Dong commented on HIVE-2272:
---

+1, please open a follow up JIRA for setting timezones.

> add TIMESTAMP data type
> ---
>
> Key: HIVE-2272
> URL: https://issues.apache.org/jira/browse/HIVE-2272
> Project: Hive
>  Issue Type: New Feature
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2272.1.patch, hive-2272.10.patch, 
> hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, 
> hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, hive-2272.9.patch
>
>
> Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
> 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
> using both LazyBinary and LazySimple SerDes. 
> For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
> parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-27 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2309:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

commited. Thanks Paul!

> Incorrect regular expression for extracting task id from filename
> -
>
> Key: HIVE-2309
> URL: https://issues.apache.org/jira/browse/HIVE-2309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Paul Yang
>Assignee: Paul Yang
>Priority: Minor
> Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch
>
>
> For producing the correct filenames for bucketed tables, there is a method in 
> Utilities.java that extracts out the task id from the filename and replaces 
> it with the bucket number. There is a bug in the regex that is used to 
> extract this value for attempt numbers >= 10:
> {code}
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_10').group(1)
> '10'
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_9').group(1)
> '001210'
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible

2011-07-26 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Summary: Comparison Operators convert number types to common type instead 
of double if possible  (was: Comparison Operators convert number types to 
common type instead of double if necessary)

> Comparison Operators convert number types to common type instead of double if 
> possible
> --
>
> Key: HIVE-2248
> URL: https://issues.apache.org/jira/browse/HIVE-2248
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Siying Dong
>Assignee: Siying Dong
> Fix For: 0.8.0
>
> Attachments: HIVE-2248.1.patch
>
>
> Now if the two sides of comparison is of different type, we always convert 
> both to double and compare. It was a slight regression from the change in 
> https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
> using GenericUDFBridge, always tried to find common type first.
> The worse case is this: If you did "WHERE  = 0 ", we always 
> convert the column and 0 to double and compare, which is wasteful, though it 
> is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071420#comment-13071420
 ] 

Siying Dong commented on HIVE-2309:
---

+1, will commit after tests pass

> Incorrect regular expression for extracting task id from filename
> -
>
> Key: HIVE-2309
> URL: https://issues.apache.org/jira/browse/HIVE-2309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Paul Yang
>Assignee: Paul Yang
>Priority: Minor
> Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch
>
>
> For producing the correct filenames for bucketed tables, there is a method in 
> Utilities.java that extracts out the task id from the filename and replaces 
> it with the bucket number. There is a bug in the regex that is used to 
> extract this value for attempt numbers >= 10:
> {code}
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_10').group(1)
> '10'
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_9').group(1)
> '001210'
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071409#comment-13071409
 ] 

Siying Dong commented on HIVE-2309:
---

can we limit number of digits for the attempt ID?

> Incorrect regular expression for extracting task id from filename
> -
>
> Key: HIVE-2309
> URL: https://issues.apache.org/jira/browse/HIVE-2309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Paul Yang
>Assignee: Paul Yang
>Priority: Minor
> Attachments: HIVE-2309.1.patch
>
>
> For producing the correct filenames for bucketed tables, there is a method in 
> Utilities.java that extracts out the task id from the filename and replaces 
> it with the bucket number. There is a bug in the regex that is used to 
> extract this value for attempt numbers >= 10:
> {code}
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_10').group(1)
> '10'
> >>> re.match("^.*?([0-9]+)(_[0​-9])?(\\..*)?$", 
> >>> 'attempt_201107090429_6496​5_m_001210_9').group(1)
> '001210'
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-25 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070865#comment-13070865
 ] 

Siying Dong commented on HIVE-2282:
---

I don't know why but I ran the test suites twice and both failed. Can you 
rebase your codes and try to run the whole test suites and see whether all the 
tests pass? I'll try again too.

> Local mode needs to work well with block sampling
> -
>
> Key: HIVE-2282
> URL: https://issues.apache.org/jira/browse/HIVE-2282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
> HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt
>
>
> Currently, if block sampling is enabled and large set of data are sampled to 
> a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double

2011-07-25 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070811#comment-13070811
 ] 

Siying Dong commented on HIVE-2249:
---

Joseph, can you handle the string case too?

> When creating constant expression for numbers, try to infer type from another 
> comparison operand, instead of trying to use integer first, and then long and 
> double
> --
>
> Key: HIVE-2249
> URL: https://issues.apache.org/jira/browse/HIVE-2249
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Joseph Barillari
> Attachments: HIVE-2249.1.patch.txt
>
>
> The current code to build constant expression for numbers, here is the code:
>  try {
> v = Double.valueOf(expr.getText());
> v = Long.valueOf(expr.getText());
> v = Integer.valueOf(expr.getText());
>   } catch (NumberFormatException e) {
> // do nothing here, we will throw an exception in the following block
>   }
>   if (v == null) {
> throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
> .getMsg(expr));
>   }
>   return new ExprNodeConstantDesc(v);
> The for the case that "WHERE  = 0", or "WHERE  
> = 0", we always have to do a type conversion when comparing, which is 
> unnecessary if it is slightly smarter to choose type when creating the 
> constant expression. We can simply walk one level up the tree, find another 
> comparison party and use the same type with that one if it is possible. For 
> user's wrong query like '=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-25 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.4.patch

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch, 
> HIVE-2236.4.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double

2011-07-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2249:
-

Assignee: Joseph Barillari

> When creating constant expression for numbers, try to infer type from another 
> comparison operand, instead of trying to use integer first, and then long and 
> double
> --
>
> Key: HIVE-2249
> URL: https://issues.apache.org/jira/browse/HIVE-2249
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Joseph Barillari
> Attachments: HIVE-2249.1.patch.txt
>
>
> The current code to build constant expression for numbers, here is the code:
>  try {
> v = Double.valueOf(expr.getText());
> v = Long.valueOf(expr.getText());
> v = Integer.valueOf(expr.getText());
>   } catch (NumberFormatException e) {
> // do nothing here, we will throw an exception in the following block
>   }
>   if (v == null) {
> throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
> .getMsg(expr));
>   }
>   return new ExprNodeConstantDesc(v);
> The for the case that "WHERE  = 0", or "WHERE  
> = 0", we always have to do a type conversion when comparing, which is 
> unnecessary if it is slightly smarter to choose type when creating the 
> constant expression. We can simply walk one level up the tree, find another 
> comparison party and use the same type with that one if it is possible. For 
> user's wrong query like '=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

2011-07-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2296:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed. Thanks Franklin!

> bad compressed file names from insert into
> --
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output 
> (hive.exec.compress.output=true) and existing files in the table, it may copy 
> the new files in bad file names:
> Before INSERT INTO:
> 00_0.gz
> After INSERT INTO:
> 00_0.gz
> 00_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 00_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-22 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069611#comment-13069611
 ] 

Siying Dong commented on HIVE-2282:
---

Also, query like "select key, value from sih_src tablesample(1 percent)" 
actually doesn't generate stable result. You can use select count(1) instead. 
That will generate correct results.

> Local mode needs to work well with block sampling
> -
>
> Key: HIVE-2282
> URL: https://issues.apache.org/jira/browse/HIVE-2282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
> HIVE-2282.3.patch.txt
>
>
> Currently, if block sampling is enabled and large set of data are sampled to 
> a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-22 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069610#comment-13069610
 ] 

Siying Dong commented on HIVE-2282:
---

Kevin, you forgot to add file 
ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out to the patch.

> Local mode needs to work well with block sampling
> -
>
> Key: HIVE-2282
> URL: https://issues.apache.org/jira/browse/HIVE-2282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
> HIVE-2282.3.patch.txt
>
>
> Currently, if block sampling is enabled and large set of data are sampled to 
> a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2296) bad compressed file names from insert into

2011-07-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069314#comment-13069314
 ] 

Siying Dong commented on HIVE-2296:
---

+1

> bad compressed file names from insert into
> --
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output 
> (hive.exec.compress.output=true) and existing files in the table, it may copy 
> the new files in bad file names:
> Before INSERT INTO:
> 00_0.gz
> After INSERT INTO:
> 00_0.gz
> 00_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 00_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION

2011-07-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069126#comment-13069126
 ] 

Siying Dong commented on HIVE-2247:
---

I'm looking at the patch. Please test the backward compatible between the old 
server, new client and new server, old client. Please come by if you don't know 
how to test it.

> ALTER TABLE RENAME PARTITION
> 
>
> Key: HIVE-2247
> URL: https://issues.apache.org/jira/browse/HIVE-2247
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siying Dong
>Assignee: Weiyan Wang
> Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, 
> HIVE-2247.5.patch.txt
>
>
> We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
> TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.3.patch

fix a bug

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Open  (was: Patch Available)

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-07-20 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.4.patch

1. change block merge task too
2. change the capital file name

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, 
> HIVE-2201.4.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-19 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.2.patch

remove the MapRedStat list from DriverContext and add more counters.

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-15 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066299#comment-13066299
 ] 

Siying Dong commented on HIVE-2282:
---

+1, will commit after testing.

> Local mode needs to work well with block sampling
> -
>
> Key: HIVE-2282
> URL: https://issues.apache.org/jira/browse/HIVE-2282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
> HIVE-2282.3.patch.txt
>
>
> Currently, if block sampling is enabled and large set of data are sampled to 
> a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION

2011-07-13 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064994#comment-13064994
 ] 

Siying Dong commented on HIVE-2247:
---

Sorry for the confusion. I just meaned to change the directory name where the 
data is, and change the "location" parameter in the partition metadata.
If we decide not to change physical path, we just change partition name. If we 
need to change the physical path, then we need to change partition name and 
location.



> ALTER TABLE RENAME PARTITION
> 
>
> Key: HIVE-2247
> URL: https://issues.apache.org/jira/browse/HIVE-2247
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siying Dong
>Assignee: Weiyan Wang
> Attachments: HIVE-2247.3.patch.txt
>
>
> We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
> TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2272) add TIMESTAMP data type

2011-07-13 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064935#comment-13064935
 ] 

Siying Dong commented on HIVE-2272:
---

Can you add it to review board?

> add TIMESTAMP data type
> ---
>
> Key: HIVE-2272
> URL: https://issues.apache.org/jira/browse/HIVE-2272
> Project: Hive
>  Issue Type: New Feature
>Reporter: Franklin Hu
>Assignee: Franklin Hu
> Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch
>
>
> Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
> 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
> using both LazyBinary and LazySimple SerDes. 
> For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
> parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-13 Thread Siying Dong (JIRA)
Local mode needs to work well with block sampling
-

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong


Currently, if block sampling is enabled and large set of data are sampled to a 
small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2247) ALTER TABLE RENAME PARTITION

2011-07-13 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2247:
-

Assignee: Weiyan Wang

> ALTER TABLE RENAME PARTITION
> 
>
> Key: HIVE-2247
> URL: https://issues.apache.org/jira/browse/HIVE-2247
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siying Dong
>Assignee: Weiyan Wang
>
> We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
> TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-11 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-306) Support "INSERT [INTO] destination"

2011-07-11 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-306:
-

Status: Patch Available  (was: Open)

> Support "INSERT [INTO] destination"
> ---
>
> Key: HIVE-306
> URL: https://issues.apache.org/jira/browse/HIVE-306
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Franklin Hu
> Attachments: hive-306.1.patch, hive-306.2.patch, hive-306.3.patch, 
> hive-306.4.patch
>
>
> Currently hive only supports "INSERT OVERWRITE destination". We should 
> support "INSERT [INTO] destination".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1721) use bloom filters to improve the performance of joins

2011-07-11 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063501#comment-13063501
 ] 

Siying Dong commented on HIVE-1721:
---

Andrew, what do you mean by "the filter could be built in parallel with an MR 
job"? Our initial plan was to only build filter based on smaller tables and 
apply the filter against the big table to reduce data to be shuffled. 

For the syntax, the plan is to use syntax like MAPJOIN. We can do something 
like SELECT /*+ BLOOMFILTER(t1) +*/ ... FROM t1 JOIN t2 ...

> use bloom filters to improve the performance of joins
> -
>
> Key: HIVE-1721
> URL: https://issues.apache.org/jira/browse/HIVE-1721
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: J. Andrew Key
>  Labels: optimization
>
> In case of map-joins, it is likely that the big table will not find many 
> matching rows from the small table.
> Currently, we perform a hash-map lookup for every row in the big table, which 
> can be pretty expensive.
> It might be useful to try out a bloom-filter containing all the elements in 
> the small table.
> Each element from the big table is first searched in the bloom filter, and 
> only in case of a positive match,
> the small table hash table is explored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION

2011-07-11 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063457#comment-13063457
 ] 

Siying Dong commented on HIVE-2247:
---

The use case of use is that we want to have sanity check for the quality of the 
data in a temp partition name before we move the data to the partition that 
people consider that the partition is ready. We want to avoid data scanning for 
this operation.

> ALTER TABLE RENAME PARTITION
> 
>
> Key: HIVE-2247
> URL: https://issues.apache.org/jira/browse/HIVE-2247
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siying Dong
>
> We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
> TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-306) Support "INSERT [INTO] destination"

2011-06-30 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058206#comment-13058206
 ] 

Siying Dong commented on HIVE-306:
--

Test breaks: TestParseNegative

> Support "INSERT [INTO] destination"
> ---
>
> Key: HIVE-306
> URL: https://issues.apache.org/jira/browse/HIVE-306
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Franklin Hu
> Attachments: hive-306.1.patch, hive-306.2.patch
>
>
> Currently hive only supports "INSERT OVERWRITE destination". We should 
> support "INSERT [INTO] destination".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-306) Support "INSERT [INTO] destination"

2011-06-30 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058140#comment-13058140
 ] 

Siying Dong commented on HIVE-306:
--

+1. Looks good to me for now. I'm running tests. If it is committed, please 
open a follow-up JIRA for making moving files more efficient and compacting 
smaller files smarter for it.

> Support "INSERT [INTO] destination"
> ---
>
> Key: HIVE-306
> URL: https://issues.apache.org/jira/browse/HIVE-306
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Franklin Hu
> Attachments: hive-306.1.patch, hive-306.2.patch
>
>
> Currently hive only supports "INSERT OVERWRITE destination". We should 
> support "INSERT [INTO] destination".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double

2011-06-30 Thread Siying Dong (JIRA)
When creating constant expression for numbers, try to infer type from another 
comparison operand, instead of trying to use integer first, and then long and 
double
--

 Key: HIVE-2249
 URL: https://issues.apache.org/jira/browse/HIVE-2249
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong


The current code to build constant expression for numbers, here is the code:

 try {
v = Double.valueOf(expr.getText());
v = Long.valueOf(expr.getText());
v = Integer.valueOf(expr.getText());
  } catch (NumberFormatException e) {
// do nothing here, we will throw an exception in the following block
  }
  if (v == null) {
throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
.getMsg(expr));
  }
  return new ExprNodeConstantDesc(v);


The for the case that "WHERE  = 0", or "WHERE  = 
0", we always have to do a type conversion when comparing, which is unnecessary 
if it is slightly smarter to choose type when creating the constant expression. 
We can simply walk one level up the tree, find another comparison party and use 
the same type with that one if it is possible. For user's wrong query like 
'=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

2011-06-30 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Attachment: HIVE-2248.1.patch

> Comparison Operators convert number types to common type instead of double if 
> necessary
> ---
>
> Key: HIVE-2248
> URL: https://issues.apache.org/jira/browse/HIVE-2248
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2248.1.patch
>
>
> Now if the two sides of comparison is of different type, we always convert 
> both to double and compare. It was a slight regression from the change in 
> https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
> using GenericUDFBridge, always tried to find common type first.
> The worse case is this: If you did "WHERE  = 0 ", we always 
> convert the column and 0 to double and compare, which is wasteful, though it 
> is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

2011-06-30 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Status: Patch Available  (was: Open)

> Comparison Operators convert number types to common type instead of double if 
> necessary
> ---
>
> Key: HIVE-2248
> URL: https://issues.apache.org/jira/browse/HIVE-2248
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2248.1.patch
>
>
> Now if the two sides of comparison is of different type, we always convert 
> both to double and compare. It was a slight regression from the change in 
> https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
> using GenericUDFBridge, always tried to find common type first.
> The worse case is this: If you did "WHERE  = 0 ", we always 
> convert the column and 0 to double and compare, which is wasteful, though it 
> is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

2011-06-30 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Description: 
Now if the two sides of comparison is of different type, we always convert both 
to double and compare. It was a slight regression from the change in 
https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
using GenericUDFBridge, always tried to find common type first.

The worse case is this: If you did "WHERE  = 0 ", we always 
convert the column and 0 to double and compare, which is wasteful, though it is 
usually a minor costs in the system. But it is easy to fix.

  was:Now if the two sides of comparison is of different type, we always 
convert both to double and compare. It was a slight regression from the change 
in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
using GenericUDFBridge, always tried to find common type first.


> Comparison Operators convert number types to common type instead of double if 
> necessary
> ---
>
> Key: HIVE-2248
> URL: https://issues.apache.org/jira/browse/HIVE-2248
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> Now if the two sides of comparison is of different type, we always convert 
> both to double and compare. It was a slight regression from the change in 
> https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
> using GenericUDFBridge, always tried to find common type first.
> The worse case is this: If you did "WHERE  = 0 ", we always 
> convert the column and 0 to double and compare, which is wasteful, though it 
> is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

2011-06-30 Thread Siying Dong (JIRA)
Comparison Operators convert number types to common type instead of double if 
necessary
---

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong


Now if the two sides of comparison is of different type, we always convert both 
to double and compare. It was a slight regression from the change in 
https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOP, 
using GenericUDFBridge, always tried to find common type first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2247) CREATE TABLE RENAME PARTITION

2011-06-30 Thread Siying Dong (JIRA)
CREATE TABLE RENAME PARTITION
-

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong


We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-27 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056205#comment-13056205
 ] 

Siying Dong commented on HIVE-2035:
---

committed

> Use block-level merge for RCFile if merging intermediate results are needed
> ---
>
> Key: HIVE-2035
> URL: https://issues.apache.org/jira/browse/HIVE-2035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Franklin Hu
> Attachments: hive-2035.1.patch, hive-2035.3.patch
>
>
> Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
> the intermediate data could be merged using an additional MapReduce job. This 
> could be quite expensive if the data size is large. With HIVE-1950, merging 
> can be done in the RCFile block level so that it bypasses the 
> (de-)compression, (de-)serialization phases. This could improve the merge 
> process significantly. 
> This JIRA should handle the case where the input table is not stored in 
> RCFile, but the destination table is (which requires the intermediate data 
> should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-27 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2035:
--

Status: Patch Available  (was: Open)

> Use block-level merge for RCFile if merging intermediate results are needed
> ---
>
> Key: HIVE-2035
> URL: https://issues.apache.org/jira/browse/HIVE-2035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Franklin Hu
> Attachments: hive-2035.1.patch, hive-2035.3.patch
>
>
> Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
> the intermediate data could be merged using an additional MapReduce job. This 
> could be quite expensive if the data size is large. With HIVE-1950, merging 
> can be done in the RCFile block level so that it bypasses the 
> (de-)compression, (de-)serialization phases. This could improve the merge 
> process significantly. 
> This JIRA should handle the case where the input table is not stored in 
> RCFile, but the destination table is (which requires the intermediate data 
> should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-26 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055355#comment-13055355
 ] 

Siying Dong commented on HIVE-2035:
---

+1, will run regression tests

> Use block-level merge for RCFile if merging intermediate results are needed
> ---
>
> Key: HIVE-2035
> URL: https://issues.apache.org/jira/browse/HIVE-2035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Franklin Hu
> Attachments: hive-2035.1.patch, hive-2035.3.patch
>
>
> Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
> the intermediate data could be merged using an additional MapReduce job. This 
> could be quite expensive if the data size is large. With HIVE-1950, merging 
> can be done in the RCFile block level so that it bypasses the 
> (de-)compression, (de-)serialization phases. This could improve the merge 
> process significantly. 
> This JIRA should handle the case where the input table is not stored in 
> RCFile, but the destination table is (which requires the intermediate data 
> should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-24 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054595#comment-13054595
 ] 

Siying Dong commented on HIVE-2201:
---

Yongqiang:
1. As I commented previously "According to Hairong Kuang, Hadoop's behavior for 
creating a new file is that it will automatically create it's parent directory 
if it doesn't exist. In that case, I removed the directory check and create 
part when writing to a new file."
2. I tested the codes. I ran the whole regression tests and tested several 
cases manually in the cluster. I tried to kill some tasks manually
3. I'll see whether there are another dependency so that I can remove the old 
one. Having two reloaded calls are the convention we have in the file. All 
other similar calls have one function with Path call and one with String call. 
4. The tree traversal logic is copied from localizeMRTmpFilesImpl(). The first 
look is to go through every operator tree. The second loop is to Breadth-First 
Search the operator tree to check any FileSyncOperator.
5. OK. I'll make the change. My understanding is that only FileSinkOperator and 
the BlockMerge file sink have the problem and the second one is going to have 
some large changes by HIVE-2035. Also BlockMerge file sink suffers the problem 
less as it runs faster that has less change to have incomplete results.

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-23 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054188#comment-13054188
 ] 

Siying Dong commented on HIVE-2201:
---

ping

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.1.patch

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Attachments: HIVE-2236.1.patch
>
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2236:
-

Assignee: Siying Dong

> Cli: Print Hadoop's CPU milliseconds
> 
>
> Key: HIVE-2236
> URL: https://issues.apache.org/jira/browse/HIVE-2236
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
>
> CPU Milliseonds information is available from Hadoop's framework. Printing it 
> out to Hive CLI when executing a job will help users to know more about their 
> jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)
Cli: Print Hadoop's CPU milliseconds


 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Priority: Minor


CPU Milliseonds information is available from Hadoop's framework. Printing it 
out to Hive CLI when executing a job will help users to know more about their 
jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-17 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051415#comment-13051415
 ] 

Siying Dong commented on HIVE-2035:
---

will take a look.

> Use block-level merge for RCFile if merging intermediate results are needed
> ---
>
> Key: HIVE-2035
> URL: https://issues.apache.org/jira/browse/HIVE-2035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Franklin Hu
> Attachments: hive-2035.1.patch
>
>
> Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
> the intermediate data could be merged using an additional MapReduce job. This 
> could be quite expensive if the data size is large. With HIVE-1950, merging 
> can be done in the RCFile block level so that it bypasses the 
> (de-)compression, (de-)serialization phases. This could improve the merge 
> process significantly. 
> This JIRA should handle the case where the input table is not stored in 
> RCFile, but the destination table is (which requires the intermediate data 
> should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-13 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: Patch Available  (was: In Progress)

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-13 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: In Progress  (was: Patch Available)

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-13 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.3.patch

According to Hairong Kuang, Hadoop's behavior for creating a new file is that 
it will automatically create it's parent directory if it doesn't exist. In that 
case, I removed the directory check and create part when writing to a new file.

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.2.patch

fix a bug.

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-2201 started by Siying Dong.

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: (was: HIVE-2201.1.patch)

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: Patch Available  (was: In Progress)

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

Implemented the logic.
Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need 
to check whether /tmp2 exists before moving it. This patch avoids this call by 
pre-create the temp directory before submitting the job. However, we cannot do 
that for dynamic partitioning as we don't know the directory names. So for 
dynamic partitioning, we have some extra costs added for DFS namenode read. So 
far I think this tradeoff is worthwhile. Potentially this cost can be reduced 
it by caching directories created. We can try that approach as a followup.

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-2201.1.patch
>
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Assignee: Siying Dong
 Summary: reduce name node calls in hive by creating temporary directories  
(was: remove name node calls in hive by creating temporary directories)

> reduce name node calls in hive by creating temporary directories
> 
>
> Key: HIVE-2201
> URL: https://issues.apache.org/jira/browse/HIVE-2201
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Siying Dong
>
> Currently, in Hive, when a file gets written by a FileSinkOperator,
> the sequence of operations is as follows:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp1/1
> 3. Move directory /tmp1 to /tmp2
> 4. For all files in /tmp2, remove all files starting with _tmp and
> duplicate files.
> Due to speculative execution, a lot of temporary files are created
> in /tmp1 (or /tmp2). This leads to a lot of name node calls,
> specially for large queries.
> The protocol above can be modified slightly:
> 1. In tmp directory tmp1, create a tmp file _tmp_1
> 2. At the end of the operator, move
> /tmp1/_tmp_1 to /tmp2/1
> 3. Move directory /tmp2 to /tmp3
> 4. For all files in /tmp3, remove all duplicate files.
> This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2211) Fix a bug caused by HIVE-243

2011-06-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2211:
--

Summary: Fix a bug caused by HIVE-243  (was: Revert)

> Fix a bug caused by HIVE-243
> 
>
> Key: HIVE-2211
> URL: https://issues.apache.org/jira/browse/HIVE-2211
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
> Attachments: HIVE-2211.1.patch
>
>
> Quick fix a bug caused by HIVE-243
> HIVE-234 removed the codes to wait for the threads to finish and use 
> ThreadPoolExector.shutdown() to wait for the results. The usage of 
> ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
> function blocks until all threads finish running but it actually only marks 
> status and won't block. It caused wrong result of Utilities.getInputSummary() 
> and caused many jobs are executed as local mode while they have huge data.
> Revert those changes quickly. We can have a follow-up to see how to deal with 
> this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2211) Revert

2011-06-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2211:
--

Status: Patch Available  (was: Open)

> Revert
> --
>
> Key: HIVE-2211
> URL: https://issues.apache.org/jira/browse/HIVE-2211
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
> Attachments: HIVE-2211.1.patch
>
>
> Quick fix a bug caused by HIVE-243
> HIVE-234 removed the codes to wait for the threads to finish and use 
> ThreadPoolExector.shutdown() to wait for the results. The usage of 
> ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
> function blocks until all threads finish running but it actually only marks 
> status and won't block. It caused wrong result of Utilities.getInputSummary() 
> and caused many jobs are executed as local mode while they have huge data.
> Revert those changes quickly. We can have a follow-up to see how to deal with 
> this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2211) Revert

2011-06-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2211:
--

Attachment: HIVE-2211.1.patch

Just a simple revert. I did a small modification: when catching 
InterruptedException, stop waiting pending threads and exit.

> Revert
> --
>
> Key: HIVE-2211
> URL: https://issues.apache.org/jira/browse/HIVE-2211
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
> Attachments: HIVE-2211.1.patch
>
>
> Quick fix a bug caused by HIVE-243
> HIVE-234 removed the codes to wait for the threads to finish and use 
> ThreadPoolExector.shutdown() to wait for the results. The usage of 
> ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
> function blocks until all threads finish running but it actually only marks 
> status and won't block. It caused wrong result of Utilities.getInputSummary() 
> and caused many jobs are executed as local mode while they have huge data.
> Revert those changes quickly. We can have a follow-up to see how to deal with 
> this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2211) Revert

2011-06-09 Thread Siying Dong (JIRA)
Revert
--

 Key: HIVE-2211
 URL: https://issues.apache.org/jira/browse/HIVE-2211
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong


Quick fix a bug caused by HIVE-243

HIVE-234 removed the codes to wait for the threads to finish and use 
ThreadPoolExector.shutdown() to wait for the results. The usage of 
ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
function blocks until all threads finish running but it actually only marks 
status and won't block. It caused wrong result of Utilities.getInputSummary() 
and caused many jobs are executed as local mode while they have huge data.

Revert those changes quickly. We can have a follow-up to see how to deal with 
this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus

2011-06-08 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2186:
--

  Resolution: Fixed
Release Note: Committed. Thanks Franklin.
  Status: Resolved  (was: Patch Available)

> Dynamic Partitioning Failing because of characters not supported globStatus
> ---
>
> Key: HIVE-2186
> URL: https://issues.apache.org/jira/browse/HIVE-2186
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Siying Dong
>Assignee: Franklin Hu
> Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, 
> hive-2186.4.patch, hive-2186.5.patch
>
>
> Some dynamic queries failed on the stage of loading partitions if dynamic 
> partition columns contain special characters. We need to escape all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose

2011-06-08 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2199:
--

  Resolution: Fixed
Release Note: Committed. Thanks Franklin.
  Status: Resolved  (was: Patch Available)

> incorrect success flag passed to jobClose
> -
>
> Key: HIVE-2199
> URL: https://issues.apache.org/jira/browse/HIVE-2199
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Franklin Hu
>Assignee: Franklin Hu
>Priority: Minor
> Attachments: hive-2199.1.patch
>
>
> For block level merging of RCFiles, jobClose is passed the incorrect variable 
> as the success flag

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus

2011-06-02 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043235#comment-13043235
 ] 

Siying Dong commented on HIVE-2186:
---

You need to show partition after dropping partition to make sure dropping 
partition succeeded.

> Dynamic Partitioning Failing because of characters not supported globStatus
> ---
>
> Key: HIVE-2186
> URL: https://issues.apache.org/jira/browse/HIVE-2186
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Siying Dong
>Assignee: Franklin Hu
> Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch
>
>
> Some dynamic queries failed on the stage of loading partitions if dynamic 
> partition columns contain special characters. We need to escape all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus

2011-05-31 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041773#comment-13041773
 ] 

Siying Dong commented on HIVE-2186:
---

@Franklin, in your test case, can you also drop the partition ds=1 and show 
partitions again to make sure those partitions can be safely dropped?

> Dynamic Partitioning Failing because of characters not supported globStatus
> ---
>
> Key: HIVE-2186
> URL: https://issues.apache.org/jira/browse/HIVE-2186
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Siying Dong
>Assignee: Franklin Hu
> Attachments: hive-2186.1.patch, hive-2186.2.patch
>
>
> Some dynamic queries failed on the stage of loading partitions if dynamic 
> partition columns contain special characters. We need to escape all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >