[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Status: Open  (was: Patch Available)

> Make QueryPlan serializable
> ---
>
> Key: HIVE-1117
> URL: https://issues.apache.org/jira/browse/HIVE-1117
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some 
> time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Attachment: (was: HIVE-1119.1.patch)

> Make QueryPlan serializable
> ---
>
> Key: HIVE-1117
> URL: https://issues.apache.org/jira/browse/HIVE-1117
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some 
> time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1119) Make all Tasks and Works serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1119:
-

Status: Patch Available  (was: Open)

> Make all Tasks and Works serializable
> -
>
> Key: HIVE-1119
> URL: https://issues.apache.org/jira/browse/HIVE-1119
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1119.1.patch
>
>
> All Tasks/Works (not just MapredTask and MapredWork) should be serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1119) Make all Tasks and Works serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1119:
-

Attachment: HIVE-1119.1.patch

This patch makes all tasks and works serializable.
Currently there are no additional tests.
Once the whole QueryPlan is serializable, we will serialize the whole query 
plan after compilation and deserialize the whole plan before execution. This 
will automatically test all the serialization/deserialization.

I prefer to commit this one first given that major code refactoring/clean-up is 
happening.

> Make all Tasks and Works serializable
> -
>
> Key: HIVE-1119
> URL: https://issues.apache.org/jira/browse/HIVE-1119
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1119.1.patch
>
>
> All Tasks/Works (not just MapredTask and MapredWork) should be serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Status: Patch Available  (was: Open)

> Make QueryPlan serializable
> ---
>
> Key: HIVE-1117
> URL: https://issues.apache.org/jira/browse/HIVE-1117
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
> Attachments: HIVE-1119.1.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some 
> time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Attachment: HIVE-1119.1.patch

This patch makes all tasks and works serializable.

Currently there are no additional tests.
Once the whole QueryPlan is serializable, we will serialize the whole query 
plan after compilation and deserialize the whole plan before execution. This 
will automatically test all the serialization/deserialization.

I prefer to commit this one first given that major code refactoring/clean-up is 
happening.


> Make QueryPlan serializable
> ---
>
> Key: HIVE-1117
> URL: https://issues.apache.org/jira/browse/HIVE-1117
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
> Attachments: HIVE-1119.1.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some 
> time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1119) Make all Tasks and Works serializable

2010-01-28 Thread Zheng Shao (JIRA)
Make all Tasks and Works serializable
-

 Key: HIVE-1119
 URL: https://issues.apache.org/jira/browse/HIVE-1119
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


All Tasks/Works (not just MapredTask and MapredWork) should be serializable.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1092) Conditional task does not increase finished job counter when filter job out.

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1092:
---

Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

> Conditional task does not increase finished job counter when filter job out.
> 
>
> Key: HIVE-1092
> URL: https://issues.apache.org/jira/browse/HIVE-1092
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1092.patch
>
>
> Ended Job = 864272330, job is filtered out (removed at runtime).
> Launching Job 2 out of 3 
> It should be 'Launching Job 3 out of 3"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a "table type" enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806243#action_12806243
 ] 

Zheng Shao commented on HIVE-1068:
--

The meaning of isNativeSerDe(String) changed. Can we rename it to 
"shouldGetColsFromSerDe" (and invert the condition)?


> CREATE VIEW followup:  add a "table type" enum attribute in metastore's 
> MTable, and also null out irrelevant attributes for MTable instances which 
> describe views
> -
>
> Key: HIVE-1068
> URL: https://issues.apache.org/jira/browse/HIVE-1068
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1068.1.patch
>
>
> Zheng's description:
> 5. TODO: Metadata change: We store "view" definitions in the same metadata 
> table that we store "table" definitions.
> Shall we add a field "table type" so we know whether it's a table, external 
> table, view, or materialized view in the future.
> We should clean up the additional useless fields in "view" - the test output 
> shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1109:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

Commit. Thanks Zheng!

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1118) Hive merge map files should have different bytes/mapper setting

2010-01-28 Thread Zheng Shao (JIRA)
Hive merge map files should have different bytes/mapper setting
---

 Key: HIVE-1118
 URL: https://issues.apache.org/jira/browse/HIVE-1118
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao


Currently, by default, we get one reducer for each 1GB of input data.
It's also true for the conditional merge job that will run if the average file 
size is smaller than a threshold.

This actually makes those job very slow, because each reducer needs to consume 
1GB of data.

Alternatively, we can just use that threshold to determine the number of 
reducers per job (or introduce a new parameter).
Let's say the threshold is 1MB, then we only start the the merge job if the 
average file size is less than 1MB, and the eventual result file size will be 
around 1MB (or another small number).

This will remove the extreme cases where we have thousands of empty files, but 
still make normal jobs fast enough.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806237#action_12806237
 ] 

Ning Zhang commented on HIVE-1109:
--

Zheng, fs_default_name2.q still is failing. It seesm getExternalScratchDir need 
to catch the IllegalParameterException as well.

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Attachment: HIVE-1109.3.patch

Thanks Ning. Fixed the 2 failed negative tests.

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806227#action_12806227
 ] 

Ning Zhang commented on HIVE-1109:
--

Zheng, two negative unit tests failed: fs_default_name1.q, fs_default_name2.q. 
Can you take a look?

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-705) Let Hive can analyse hbase's tables

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-705:
---

Assignee: John Sichi  (was: Samuel Guo)

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806193#action_12806193
 ] 

John Sichi commented on HIVE-705:
-

I'm going to start working on getting this ready for submission against latest 
trunk.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806183#action_12806183
 ] 

Carl Steinbach commented on HIVE-259:
-


@Jerome: Agreed. Allowing sort results to be shared by multiple functions (like 
in the following example) is key to supporting analytic functions efficiently.

{code:sql}
SELECT department_id,
   PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  "Median cont",
   PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  "Median disc"
   FROM employees GROUP BY department_id;
{code}

> Add PERCENTILE aggregate function
> -
>
> Key: HIVE-259
> URL: https://issues.apache.org/jira/browse/HIVE-259
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Venky Iyer
>
> Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806173#action_12806173
 ] 

Ning Zhang commented on HIVE-1109:
--

+1.

Will commit if tests pass.

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1110.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang

> add counters to show that skew join triggered
> -
>
> Key: HIVE-1110
> URL: https://issues.apache.org/jira/browse/HIVE-1110
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1110.2.patch, hive-1110.patch
>
>
> It would be very useful to debug, and quickly find out if the skew join was 
> triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806169#action_12806169
 ] 

Zheng Shao commented on HIVE-1109:
--

There is no separate test for temp directories.
But every query we test uses the temp directories (which are changed by this 
patch), so I think as long as all the current tests pass, it should be good.


> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806167#action_12806167
 ] 

Ning Zhang commented on HIVE-1109:
--

Code changes looks good. Is there a unit test already for temp directories?

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1092) Conditional task does not increase finished job counter when filter job out.

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1092:
---

Attachment: hive-1092.patch

> Conditional task does not increase finished job counter when filter job out.
> 
>
> Key: HIVE-1092
> URL: https://issues.apache.org/jira/browse/HIVE-1092
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1092.patch
>
>
> Ended Job = 864272330, job is filtered out (removed at runtime).
> Launching Job 2 out of 3 
> It should be 'Launching Job 3 out of 3"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-972) support views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806157#action_12806157
 ] 

John Sichi commented on HIVE-972:
-

Metastore upgrade instructions are here (they assume HIVE-1068 will be 
accepted):

http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades

There may be a more appropriate location for them in the wiki; once we have 
figured that out, we should link it from the release notes.


> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-972.1.patch, HIVE-972.2.patch, HIVE-972.3.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)
Make QueryPlan serializable
---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0


We need to make QueryPlan serializable so that we can resume the query some 
time later.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a "table type" enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806154#action_12806154
 ] 

John Sichi commented on HIVE-1068:
--

I have written up metastore upgrade instructions here:

http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades


> CREATE VIEW followup:  add a "table type" enum attribute in metastore's 
> MTable, and also null out irrelevant attributes for MTable instances which 
> describe views
> -
>
> Key: HIVE-1068
> URL: https://issues.apache.org/jira/browse/HIVE-1068
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1068.1.patch
>
>
> Zheng's description:
> 5. TODO: Metadata change: We store "view" definitions in the same metadata 
> table that we store "table" definitions.
> Shall we add a field "table type" so we know whether it's a table, external 
> table, view, or materialized view in the future.
> We should clean up the additional useless fields in "view" - the test output 
> shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1105) Add service script for starting metastore server

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1105:
-

   Resolution: Fixed
Fix Version/s: 0.5.0
 Release Note: 
HIVE-1105. Add service script for starting metastore server. (John Sichi via 
zshao)
New command for starting a Hive metastore server:
hive --service metastore


  was:
New command for starting a Hive metastore server:

hive --service metastore


 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to branch-0.5 and trunk. Thanks John!

Please mention this is available since 0.5.


> Add service script for starting metastore server
> 
>
> Key: HIVE-1105
> URL: https://issues.apache.org/jira/browse/HIVE-1105
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 0.4.1
>Reporter: John Sichi
>Assignee: John Sichi
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: HIVE-1105.1.patch
>
>
> The instructions on this page recommend running Java directly in order to 
> start the metastore:
> http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
> Since we already have a generic service-starter script, it would be nice to 
> be able to do this instead:
> hive --service metastore
> I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-592) renaming internal table should rename HDFS and also change path of the table and partitions accordingly.

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806142#action_12806142
 ] 

Zheng Shao commented on HIVE-592:
-

Tried renaming table on both branch-0.4 and branch-0.5. Both worked as expected 
(locations are modified).


> renaming internal table should rename HDFS and also change path of the table 
> and partitions accordingly.
> 
>
> Key: HIVE-592
> URL: https://issues.apache.org/jira/browse/HIVE-592
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.4.0
>
> Attachments: hive-592.2.patch, hive-592.3.patch
>
>
> rename table changes just the name of the table in metastore but not hdfs. so 
> if a table with old name is created, it uses the hdfs directory pointing to 
> the renamed table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806134#action_12806134
 ] 

Jerome Boulon commented on HIVE-259:


It will also be good to be able to ask for more than one PERCENTILE(column, 
.99) with only one single structure in memory
ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable;


> Add PERCENTILE aggregate function
> -
>
> Key: HIVE-259
> URL: https://issues.apache.org/jira/browse/HIVE-259
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Venky Iyer
>
> Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1110:
---

Attachment: hive-1110.2.patch

update a new patch integrating Namit's suggestions. 
I did not know that counters in different tasks will be finally added together. 
Thanks Namit!

> add counters to show that skew join triggered
> -
>
> Key: HIVE-1110
> URL: https://issues.apache.org/jira/browse/HIVE-1110
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1110.2.patch, hive-1110.patch
>
>
> It would be very useful to debug, and quickly find out if the skew join was 
> triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1105) Add service script for starting metastore server

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1105:
-

Release Note: 
New command for starting a Hive metastore server:

hive --service metastore

  Status: Patch Available  (was: Open)

I'll update wiki docs once this patch is accepted.

> Add service script for starting metastore server
> 
>
> Key: HIVE-1105
> URL: https://issues.apache.org/jira/browse/HIVE-1105
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 0.4.1
>Reporter: John Sichi
>Assignee: John Sichi
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1105.1.patch
>
>
> The instructions on this page recommend running Java directly in order to 
> start the metastore:
> http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
> Since we already have a generic service-starter script, it would be nice to 
> be able to do this instead:
> hive --service metastore
> I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1105) Add service script for starting metastore server

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1105:


Assignee: John Sichi

> Add service script for starting metastore server
> 
>
> Key: HIVE-1105
> URL: https://issues.apache.org/jira/browse/HIVE-1105
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 0.4.1
>Reporter: John Sichi
>Assignee: John Sichi
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1105.1.patch
>
>
> The instructions on this page recommend running Java directly in order to 
> start the metastore:
> http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
> Since we already have a generic service-starter script, it would be nice to 
> be able to do this instead:
> hive --service metastore
> I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1115:


Assignee: Paul Yang  (was: Namit Jain)

> optimize combinehiveinputformat in presence of many partitions
> --
>
> Key: HIVE-1115
> URL: https://issues.apache.org/jira/browse/HIVE-1115
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
>
> A query like :
> select ..  from T where ...
> where T contains a very large number of partitions does not work very well 
> with CombineHiveInputFomat.
> A pool is created per directory, which leads to a high number of mappers.
> In case all partitions share the same operator tree, and the same partition 
> description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a "table type" enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806119#action_12806119
 ] 

John Sichi commented on HIVE-1068:
--

See the updated output for create_view.q.out for an example of what we'll be 
storing in the view descriptor after this change.

> CREATE VIEW followup:  add a "table type" enum attribute in metastore's 
> MTable, and also null out irrelevant attributes for MTable instances which 
> describe views
> -
>
> Key: HIVE-1068
> URL: https://issues.apache.org/jira/browse/HIVE-1068
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1068.1.patch
>
>
> Zheng's description:
> 5. TODO: Metadata change: We store "view" definitions in the same metadata 
> table that we store "table" definitions.
> Shall we add a field "table type" so we know whether it's a table, external 
> table, view, or materialized view in the future.
> We should clean up the additional useless fields in "view" - the test output 
> shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Status: Patch Available  (was: Open)

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Attachment: HIVE-1109.2.patch

Incorporated the changes needed for HIVE-1113.

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1068) CREATE VIEW followup: add a "table type" enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1068:
-

Attachment: HIVE-1068.1.patch

Notes for reviewers:

(1)

I introduced the new TableType enum in package o.a.h.h.metastore.  I didn't put 
it in metastore.api since it is not a generated file, and I didn't put it in 
metastore.model since that isn't accessed by most of Hive.  If there's a better 
place to put it, please let me know.

For the enum values, I used MANAGED_TABLE, EXTERNAL_TABLE, and VIRTUAL_VIEW.  
This is so that later if we add MATERIALIZED_VIEW, there won't be any 
confusion.  If we eventually use this for implementing the TABLE_TYPE attribute 
in the ResultSet for JDBC's getTables, we can use a CASE expression to fold 
these into simpler names (e.g. JDBC uses "TABLE" and "VIEW").

(2)

I tried making the SerDe itself null, but that caused problems in too many 
places that were requiring it to be non-null.  So instead I nulled/emptied all 
of its attributes.


> CREATE VIEW followup:  add a "table type" enum attribute in metastore's 
> MTable, and also null out irrelevant attributes for MTable instances which 
> describe views
> -
>
> Key: HIVE-1068
> URL: https://issues.apache.org/jira/browse/HIVE-1068
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1068.1.patch
>
>
> Zheng's description:
> 5. TODO: Metadata change: We store "view" definitions in the same metadata 
> table that we store "table" definitions.
> Shall we add a field "table type" so we know whether it's a table, external 
> table, view, or materialized view in the future.
> We should clean up the additional useless fields in "view" - the test output 
> shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806073#action_12806073
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

yeah - positive.  it might work 'once'. but if u repeat the recipe - it will 
fail because 'a' points to 'a_tmp' next time.

> alter table rename should rename hdfs location of table as well
> ---
>
> Key: HIVE-1116
> URL: https://issues.apache.org/jira/browse/HIVE-1116
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> if the location is not an external location - this would be safer.
> the problem right now is that it's tricky to use the drop and rename way of 
> writing new data into a table. consider:
> Initialization block:
> drop table a_tmp
> create table a_tmp like a;
> Loading block:
> load data  into a_tmp;
> drop table a;
> alter table a_tmp rename to a;
> this looks safe. but it's not. if one runs this multiple times - then data is 
> lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
> dropping table 'a' blows away loaded data in the next iteration). 
> if the location is being managed by Hive - then 'rename' should switch 
> location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806075#action_12806075
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

interesting - i am hitting on our internal clusters. 592 seems pretty old.

> alter table rename should rename hdfs location of table as well
> ---
>
> Key: HIVE-1116
> URL: https://issues.apache.org/jira/browse/HIVE-1116
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> if the location is not an external location - this would be safer.
> the problem right now is that it's tricky to use the drop and rename way of 
> writing new data into a table. consider:
> Initialization block:
> drop table a_tmp
> create table a_tmp like a;
> Loading block:
> load data  into a_tmp;
> drop table a;
> alter table a_tmp rename to a;
> this looks safe. but it's not. if one runs this multiple times - then data is 
> lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
> dropping table 'a' blows away loaded data in the next iteration). 
> if the location is being managed by Hive - then 'rename' should switch 
> location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806074#action_12806074
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

yeah - positive.  it might work 'once'. but if u repeat the recipe - it will 
fail because 'a' points to 'a_tmp' next time.

> alter table rename should rename hdfs location of table as well
> ---
>
> Key: HIVE-1116
> URL: https://issues.apache.org/jira/browse/HIVE-1116
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> if the location is not an external location - this would be safer.
> the problem right now is that it's tricky to use the drop and rename way of 
> writing new data into a table. consider:
> Initialization block:
> drop table a_tmp
> create table a_tmp like a;
> Loading block:
> load data  into a_tmp;
> drop table a;
> alter table a_tmp rename to a;
> this looks safe. but it's not. if one runs this multiple times - then data is 
> lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
> dropping table 'a' blows away loaded data in the next iteration). 
> if the location is being managed by Hive - then 'rename' should switch 
> location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Status: Open  (was: Patch Available)

> Structured temporary directories
> 
>
> Key: HIVE-1109
> URL: https://issues.apache.org/jira/browse/HIVE-1109
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1109.1.patch
>
>
> Currently Hive execution uses a lot of temporary directories. These 
> directories are NOT named by date or time, so it's impossible to know what 
> are the temporary directories (in case the query failed to clean up) that can 
> be deleted safely.
> We should have a better temporary directory structure, with the date and time 
> in the directory name.
> This will help a lot when we are able to resume a query that failed in the 
> middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1112) Replace instances of StringBuffer/Vector with StringBuilder/ArrayList

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1112:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1112. Replace instances of StringBuffer and Vector with 
StringBuilder and ArrayList. (Carl Steinbach via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Carl!


> Replace instances of StringBuffer/Vector with StringBuilder/ArrayList
> -
>
> Key: HIVE-1112
> URL: https://issues.apache.org/jira/browse/HIVE-1112
> Project: Hadoop Hive
>  Issue Type: Task
>Affects Versions: 0.6.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1112.2.patch, HIVE-1112.3.patch, HIVE-1112.patch
>
>
> When possible replace instances of StringBuffer and Vector with their 
> non-synchronized counterparts StringBuilder and ArrayList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806047#action_12806047
 ] 

He Yongqiang commented on HIVE-1110:


By introducing an boolean vector to keep track of which table has already got a 
skew key, it will be able to tell how many tables have skew keys. And that can 
be used to tell how many skew jobs will be started at least from the counter in 
that reducer. So if we choose the biggest counter from all reducers, it will be 
the number of final jobs needed.

>>just increment the counter every time you see a new key.
This maybe better because sometimes i saw the counter is inaccurate. Even 
though there is a skew key and the counter got updated, it still reports zero. 
So it maybe better if we increment the counter multiple times, that maybe can 
hopefully let the reducer report a non-zero counter.

> add counters to show that skew join triggered
> -
>
> Key: HIVE-1110
> URL: https://issues.apache.org/jira/browse/HIVE-1110
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1110.patch
>
>
> It would be very useful to debug, and quickly find out if the skew join was 
> triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1115:


Assignee: Namit Jain

> optimize combinehiveinputformat in presence of many partitions
> --
>
> Key: HIVE-1115
> URL: https://issues.apache.org/jira/browse/HIVE-1115
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> A query like :
> select ..  from T where ...
> where T contains a very large number of partitions does not work very well 
> with CombineHiveInputFomat.
> A pool is created per directory, which leads to a high number of mappers.
> In case all partitions share the same operator tree, and the same partition 
> description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805999#action_12805999
 ] 

Namit Jain commented on HIVE-1110:
--

I am not sure about the semantics of counters - why do you need a bitvector for 
keeping track of which one got updated.
Isnt a non-zero value good enough ?

Is the assumption that only 1 reducer will update a particular tag. Even then 
you will only be able to know the number
of skew joins - in that case, why do you need to tag array - just increment the 
counter every time you see a new key.
Even if you update it multiple times, it is OK.


> add counters to show that skew join triggered
> -
>
> Key: HIVE-1110
> URL: https://issues.apache.org/jira/browse/HIVE-1110
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1110.patch
>
>
> It would be very useful to debug, and quickly find out if the skew join was 
> triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1093) Add a "skew join map join size" variable to control the input size of skew join's following map join job.

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1093:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang

> Add a "skew join map join size" variable to control the input size of skew 
> join's following map join job.
> -
>
> Key: HIVE-1093
> URL: https://issues.apache.org/jira/browse/HIVE-1093
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1093.2.patch, hive-1093.patch
>
>
> In a test, many skew join key itself >250M size. And the following mapjoin 
> will take several hours to do a mapjoin for those big skew keys. 
> This can be better by using a small map input size for the following map join 
> job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805985#action_12805985
 ] 

Prasad Chakka commented on HIVE-1116:
-

I remember doing this quite sometime ago(only for non-external tables). Are you 
sure it doesn't work in any scenario?

> alter table rename should rename hdfs location of table as well
> ---
>
> Key: HIVE-1116
> URL: https://issues.apache.org/jira/browse/HIVE-1116
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> if the location is not an external location - this would be safer.
> the problem right now is that it's tricky to use the drop and rename way of 
> writing new data into a table. consider:
> Initialization block:
> drop table a_tmp
> create table a_tmp like a;
> Loading block:
> load data  into a_tmp;
> drop table a;
> alter table a_tmp rename to a;
> this looks safe. but it's not. if one runs this multiple times - then data is 
> lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
> dropping table 'a' blows away loaded data in the next iteration). 
> if the location is being managed by Hive - then 'rename' should switch 
> location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)
alter table rename should rename hdfs location of table as well
---

 Key: HIVE-1116
 URL: https://issues.apache.org/jira/browse/HIVE-1116
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


if the location is not an external location - this would be safer.

the problem right now is that it's tricky to use the drop and rename way of 
writing new data into a table. consider:

Initialization block:
drop table a_tmp
create table a_tmp like a;

Loading block:
load data  into a_tmp;
drop table a;
alter table a_tmp rename to a;

this looks safe. but it's not. if one runs this multiple times - then data is 
lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
dropping table 'a' blows away loaded data in the next iteration). 

if the location is being managed by Hive - then 'rename' should switch location 
as well.









-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #350

2010-01-28 Thread Apache Hudson Server
See 

Changes:

[zshao] HIVE-1106. Support ALTER TABLE t ADD IF NOT EXIST PARTITION. (Paul Yang 
via zshao)

[namit] HIVE-1108. Make QueryPlan serializable
(Zheng Shao via namit)

--
[...truncated 13047 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[j

[jira] Updated: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-01-28 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1019:
---

Status: In Progress  (was: Patch Available)

just noticed I missed something when merging from the previouse verion. In some 
cases you still get this error.

> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> 
>
> Key: HIVE-1019
> URL: https://issues.apache.org/jira/browse/HIVE-1019
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Bennie Schut
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019.patch, 
> stacktrace2.txt
>
>
> I keep getting errors like this:
> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> and :
> java.io.IOException: cannot find dir = 
> hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
> partToPartitionInfo!
> when running multiple threads with roughly similar queries.
> I have a patch for this which works for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1112) Replace instances of StringBuffer/Vector with StringBuilder/ArrayList

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805828#action_12805828
 ] 

Zheng Shao commented on HIVE-1112:
--

+1. Will commit after test passes.

> Replace instances of StringBuffer/Vector with StringBuilder/ArrayList
> -
>
> Key: HIVE-1112
> URL: https://issues.apache.org/jira/browse/HIVE-1112
> Project: Hadoop Hive
>  Issue Type: Task
>Affects Versions: 0.6.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1112.2.patch, HIVE-1112.3.patch, HIVE-1112.patch
>
>
> When possible replace instances of StringBuffer and Vector with their 
> non-synchronized counterparts StringBuilder and ArrayList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.