[jira] Commented: (HIVE-1112) Replace instances of StringBuffer/Vector with StringBuilder/ArrayList

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805828#action_12805828
 ] 

Zheng Shao commented on HIVE-1112:
--

+1. Will commit after test passes.

 Replace instances of StringBuffer/Vector with StringBuilder/ArrayList
 -

 Key: HIVE-1112
 URL: https://issues.apache.org/jira/browse/HIVE-1112
 Project: Hadoop Hive
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1112.2.patch, HIVE-1112.3.patch, HIVE-1112.patch


 When possible replace instances of StringBuffer and Vector with their 
 non-synchronized counterparts StringBuilder and ArrayList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-01-28 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1019:
---

Status: In Progress  (was: Patch Available)

just noticed I missed something when merging from the previouse verion. In some 
cases you still get this error.

 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 

 Key: HIVE-1019
 URL: https://issues.apache.org/jira/browse/HIVE-1019
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Bennie Schut
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019.patch, 
 stacktrace2.txt


 I keep getting errors like this:
 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 and :
 java.io.IOException: cannot find dir = 
 hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
 partToPartitionInfo!
 when running multiple threads with roughly similar queries.
 I have a patch for this which works for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #350

2010-01-28 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/350/changes

Changes:

[zshao] HIVE-1106. Support ALTER TABLE t ADD IF NOT EXIST PARTITION. (Paul Yang 
via zshao)

[namit] HIVE-1108. Make QueryPlan serializable
(Zheng Shao via namit)

--
[...truncated 13047 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
 

[jira] Commented: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805999#action_12805999
 ] 

Namit Jain commented on HIVE-1110:
--

I am not sure about the semantics of counters - why do you need a bitvector for 
keeping track of which one got updated.
Isnt a non-zero value good enough ?

Is the assumption that only 1 reducer will update a particular tag. Even then 
you will only be able to know the number
of skew joins - in that case, why do you need to tag array - just increment the 
counter every time you see a new key.
Even if you update it multiple times, it is OK.


 add counters to show that skew join triggered
 -

 Key: HIVE-1110
 URL: https://issues.apache.org/jira/browse/HIVE-1110
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1110.patch


 It would be very useful to debug, and quickly find out if the skew join was 
 triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1115:


Assignee: Namit Jain

 optimize combinehiveinputformat in presence of many partitions
 --

 Key: HIVE-1115
 URL: https://issues.apache.org/jira/browse/HIVE-1115
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 A query like :
 select ..  from T where ...
 where T contains a very large number of partitions does not work very well 
 with CombineHiveInputFomat.
 A pool is created per directory, which leads to a high number of mappers.
 In case all partitions share the same operator tree, and the same partition 
 description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806047#action_12806047
 ] 

He Yongqiang commented on HIVE-1110:


By introducing an boolean vector to keep track of which table has already got a 
skew key, it will be able to tell how many tables have skew keys. And that can 
be used to tell how many skew jobs will be started at least from the counter in 
that reducer. So if we choose the biggest counter from all reducers, it will be 
the number of final jobs needed.

just increment the counter every time you see a new key.
This maybe better because sometimes i saw the counter is inaccurate. Even 
though there is a skew key and the counter got updated, it still reports zero. 
So it maybe better if we increment the counter multiple times, that maybe can 
hopefully let the reducer report a non-zero counter.

 add counters to show that skew join triggered
 -

 Key: HIVE-1110
 URL: https://issues.apache.org/jira/browse/HIVE-1110
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1110.patch


 It would be very useful to debug, and quickly find out if the skew join was 
 triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Status: Open  (was: Patch Available)

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1112) Replace instances of StringBuffer/Vector with StringBuilder/ArrayList

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1112:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1112. Replace instances of StringBuffer and Vector with 
StringBuilder and ArrayList. (Carl Steinbach via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Carl!


 Replace instances of StringBuffer/Vector with StringBuilder/ArrayList
 -

 Key: HIVE-1112
 URL: https://issues.apache.org/jira/browse/HIVE-1112
 Project: Hadoop Hive
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0

 Attachments: HIVE-1112.2.patch, HIVE-1112.3.patch, HIVE-1112.patch


 When possible replace instances of StringBuffer and Vector with their 
 non-synchronized counterparts StringBuilder and ArrayList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806075#action_12806075
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

interesting - i am hitting on our internal clusters. 592 seems pretty old.

 alter table rename should rename hdfs location of table as well
 ---

 Key: HIVE-1116
 URL: https://issues.apache.org/jira/browse/HIVE-1116
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma

 if the location is not an external location - this would be safer.
 the problem right now is that it's tricky to use the drop and rename way of 
 writing new data into a table. consider:
 Initialization block:
 drop table a_tmp
 create table a_tmp like a;
 Loading block:
 load data newdata into a_tmp;
 drop table a;
 alter table a_tmp rename to a;
 this looks safe. but it's not. if one runs this multiple times - then data is 
 lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
 dropping table 'a' blows away loaded data in the next iteration). 
 if the location is being managed by Hive - then 'rename' should switch 
 location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806074#action_12806074
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

yeah - positive.  it might work 'once'. but if u repeat the recipe - it will 
fail because 'a' points to 'a_tmp' next time.

 alter table rename should rename hdfs location of table as well
 ---

 Key: HIVE-1116
 URL: https://issues.apache.org/jira/browse/HIVE-1116
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma

 if the location is not an external location - this would be safer.
 the problem right now is that it's tricky to use the drop and rename way of 
 writing new data into a table. consider:
 Initialization block:
 drop table a_tmp
 create table a_tmp like a;
 Loading block:
 load data newdata into a_tmp;
 drop table a;
 alter table a_tmp rename to a;
 this looks safe. but it's not. if one runs this multiple times - then data is 
 lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
 dropping table 'a' blows away loaded data in the next iteration). 
 if the location is being managed by Hive - then 'rename' should switch 
 location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806073#action_12806073
 ] 

Joydeep Sen Sarma commented on HIVE-1116:
-

yeah - positive.  it might work 'once'. but if u repeat the recipe - it will 
fail because 'a' points to 'a_tmp' next time.

 alter table rename should rename hdfs location of table as well
 ---

 Key: HIVE-1116
 URL: https://issues.apache.org/jira/browse/HIVE-1116
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma

 if the location is not an external location - this would be safer.
 the problem right now is that it's tricky to use the drop and rename way of 
 writing new data into a table. consider:
 Initialization block:
 drop table a_tmp
 create table a_tmp like a;
 Loading block:
 load data newdata into a_tmp;
 drop table a;
 alter table a_tmp rename to a;
 this looks safe. but it's not. if one runs this multiple times - then data is 
 lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
 dropping table 'a' blows away loaded data in the next iteration). 
 if the location is being managed by Hive - then 'rename' should switch 
 location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1068) CREATE VIEW followup: add a table type enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1068:
-

Attachment: HIVE-1068.1.patch

Notes for reviewers:

(1)

I introduced the new TableType enum in package o.a.h.h.metastore.  I didn't put 
it in metastore.api since it is not a generated file, and I didn't put it in 
metastore.model since that isn't accessed by most of Hive.  If there's a better 
place to put it, please let me know.

For the enum values, I used MANAGED_TABLE, EXTERNAL_TABLE, and VIRTUAL_VIEW.  
This is so that later if we add MATERIALIZED_VIEW, there won't be any 
confusion.  If we eventually use this for implementing the TABLE_TYPE attribute 
in the ResultSet for JDBC's getTables, we can use a CASE expression to fold 
these into simpler names (e.g. JDBC uses TABLE and VIEW).

(2)

I tried making the SerDe itself null, but that caused problems in too many 
places that were requiring it to be non-null.  So instead I nulled/emptied all 
of its attributes.


 CREATE VIEW followup:  add a table type enum attribute in metastore's 
 MTable, and also null out irrelevant attributes for MTable instances which 
 describe views
 -

 Key: HIVE-1068
 URL: https://issues.apache.org/jira/browse/HIVE-1068
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1068.1.patch


 Zheng's description:
 5. TODO: Metadata change: We store view definitions in the same metadata 
 table that we store table definitions.
 Shall we add a field table type so we know whether it's a table, external 
 table, view, or materialized view in the future.
 We should clean up the additional useless fields in view - the test output 
 shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Attachment: HIVE-1109.2.patch

Incorporated the changes needed for HIVE-1113.

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Status: Patch Available  (was: Open)

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a table type enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806119#action_12806119
 ] 

John Sichi commented on HIVE-1068:
--

See the updated output for create_view.q.out for an example of what we'll be 
storing in the view descriptor after this change.

 CREATE VIEW followup:  add a table type enum attribute in metastore's 
 MTable, and also null out irrelevant attributes for MTable instances which 
 describe views
 -

 Key: HIVE-1068
 URL: https://issues.apache.org/jira/browse/HIVE-1068
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1068.1.patch


 Zheng's description:
 5. TODO: Metadata change: We store view definitions in the same metadata 
 table that we store table definitions.
 Shall we add a field table type so we know whether it's a table, external 
 table, view, or materialized view in the future.
 We should clean up the additional useless fields in view - the test output 
 shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1115:


Assignee: Paul Yang  (was: Namit Jain)

 optimize combinehiveinputformat in presence of many partitions
 --

 Key: HIVE-1115
 URL: https://issues.apache.org/jira/browse/HIVE-1115
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang

 A query like :
 select ..  from T where ...
 where T contains a very large number of partitions does not work very well 
 with CombineHiveInputFomat.
 A pool is created per directory, which leads to a high number of mappers.
 In case all partitions share the same operator tree, and the same partition 
 description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1105) Add service script for starting metastore server

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1105:
-

Release Note: 
New command for starting a Hive metastore server:

hive --service metastore

  Status: Patch Available  (was: Open)

I'll update wiki docs once this patch is accepted.

 Add service script for starting metastore server
 

 Key: HIVE-1105
 URL: https://issues.apache.org/jira/browse/HIVE-1105
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.4.1
Reporter: John Sichi
Assignee: John Sichi
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1105.1.patch


 The instructions on this page recommend running Java directly in order to 
 start the metastore:
 http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
 Since we already have a generic service-starter script, it would be nice to 
 be able to do this instead:
 hive --service metastore
 I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1110:
---

Attachment: hive-1110.2.patch

update a new patch integrating Namit's suggestions. 
I did not know that counters in different tasks will be finally added together. 
Thanks Namit!

 add counters to show that skew join triggered
 -

 Key: HIVE-1110
 URL: https://issues.apache.org/jira/browse/HIVE-1110
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1110.2.patch, hive-1110.patch


 It would be very useful to debug, and quickly find out if the skew join was 
 triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806134#action_12806134
 ] 

Jerome Boulon commented on HIVE-259:


It will also be good to be able to ask for more than one PERCENTILE(column, 
.99) with only one single structure in memory
ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable;


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-592) renaming internal table should rename HDFS and also change path of the table and partitions accordingly.

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806142#action_12806142
 ] 

Zheng Shao commented on HIVE-592:
-

Tried renaming table on both branch-0.4 and branch-0.5. Both worked as expected 
(locations are modified).


 renaming internal table should rename HDFS and also change path of the table 
 and partitions accordingly.
 

 Key: HIVE-592
 URL: https://issues.apache.org/jira/browse/HIVE-592
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: Prasad Chakka
 Fix For: 0.4.0

 Attachments: hive-592.2.patch, hive-592.3.patch


 rename table changes just the name of the table in metastore but not hdfs. so 
 if a table with old name is created, it uses the hdfs directory pointing to 
 the renamed table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1105) Add service script for starting metastore server

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1105:
-

   Resolution: Fixed
Fix Version/s: 0.5.0
 Release Note: 
HIVE-1105. Add service script for starting metastore server. (John Sichi via 
zshao)
New command for starting a Hive metastore server:
hive --service metastore


  was:
New command for starting a Hive metastore server:

hive --service metastore


 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to branch-0.5 and trunk. Thanks John!

Please mention this is available since 0.5.


 Add service script for starting metastore server
 

 Key: HIVE-1105
 URL: https://issues.apache.org/jira/browse/HIVE-1105
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.4.1
Reporter: John Sichi
Assignee: John Sichi
Priority: Minor
 Fix For: 0.5.0, 0.6.0

 Attachments: HIVE-1105.1.patch


 The instructions on this page recommend running Java directly in order to 
 start the metastore:
 http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
 Since we already have a generic service-starter script, it would be nice to 
 be able to do this instead:
 hive --service metastore
 I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a table type enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806154#action_12806154
 ] 

John Sichi commented on HIVE-1068:
--

I have written up metastore upgrade instructions here:

http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades


 CREATE VIEW followup:  add a table type enum attribute in metastore's 
 MTable, and also null out irrelevant attributes for MTable instances which 
 describe views
 -

 Key: HIVE-1068
 URL: https://issues.apache.org/jira/browse/HIVE-1068
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1068.1.patch


 Zheng's description:
 5. TODO: Metadata change: We store view definitions in the same metadata 
 table that we store table definitions.
 Shall we add a field table type so we know whether it's a table, external 
 table, view, or materialized view in the future.
 We should clean up the additional useless fields in view - the test output 
 shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)
Make QueryPlan serializable
---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0


We need to make QueryPlan serializable so that we can resume the query some 
time later.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-972) support views

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806157#action_12806157
 ] 

John Sichi commented on HIVE-972:
-

Metastore upgrade instructions are here (they assume HIVE-1068 will be 
accepted):

http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades

There may be a more appropriate location for them in the wiki; once we have 
figured that out, we should link it from the release notes.


 support views
 -

 Key: HIVE-972
 URL: https://issues.apache.org/jira/browse/HIVE-972
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Namit Jain
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-972.1.patch, HIVE-972.2.patch, HIVE-972.3.patch


 Hive currently does not support views. 
 It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1092) Conditional task does not increase finished job counter when filter job out.

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1092:
---

Attachment: hive-1092.patch

 Conditional task does not increase finished job counter when filter job out.
 

 Key: HIVE-1092
 URL: https://issues.apache.org/jira/browse/HIVE-1092
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1092.patch


 Ended Job = 864272330, job is filtered out (removed at runtime).
 Launching Job 2 out of 3 
 It should be 'Launching Job 3 out of 3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806167#action_12806167
 ] 

Ning Zhang commented on HIVE-1109:
--

Code changes looks good. Is there a unit test already for temp directories?

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806169#action_12806169
 ] 

Zheng Shao commented on HIVE-1109:
--

There is no separate test for temp directories.
But every query we test uses the temp directories (which are changed by this 
patch), so I think as long as all the current tests pass, it should be good.


 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1110) add counters to show that skew join triggered

2010-01-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1110.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang

 add counters to show that skew join triggered
 -

 Key: HIVE-1110
 URL: https://issues.apache.org/jira/browse/HIVE-1110
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1110.2.patch, hive-1110.patch


 It would be very useful to debug, and quickly find out if the skew join was 
 triggered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806173#action_12806173
 ] 

Ning Zhang commented on HIVE-1109:
--

+1.

Will commit if tests pass.

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806183#action_12806183
 ] 

Carl Steinbach commented on HIVE-259:
-


@Jerome: Agreed. Allowing sort results to be shared by multiple functions (like 
in the following example) is key to supporting analytic functions efficiently.

{code:sql}
SELECT department_id,
   PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  Median cont,
   PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  Median disc
   FROM employees GROUP BY department_id;
{code}

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-705) Let Hive can analyse hbase's tables

2010-01-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-705:
---

Assignee: John Sichi  (was: Samuel Guo)

 Let Hive can analyse hbase's tables
 ---

 Key: HIVE-705
 URL: https://issues.apache.org/jira/browse/HIVE-705
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Samuel Guo
Assignee: John Sichi
 Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
 HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
 HIVE-705_revision883033.patch


 Add a serde over the hbase's tables, so that hive can analyse the data stored 
 in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806193#action_12806193
 ] 

John Sichi commented on HIVE-705:
-

I'm going to start working on getting this ready for submission against latest 
trunk.

 Let Hive can analyse hbase's tables
 ---

 Key: HIVE-705
 URL: https://issues.apache.org/jira/browse/HIVE-705
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Samuel Guo
Assignee: John Sichi
 Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
 HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
 HIVE-705_revision883033.patch


 Add a serde over the hbase's tables, so that hive can analyse the data stored 
 in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806227#action_12806227
 ] 

Ning Zhang commented on HIVE-1109:
--

Zheng, two negative unit tests failed: fs_default_name1.q, fs_default_name2.q. 
Can you take a look?

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1109:
-

Attachment: HIVE-1109.3.patch

Thanks Ning. Fixed the 2 failed negative tests.

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806237#action_12806237
 ] 

Ning Zhang commented on HIVE-1109:
--

Zheng, fs_default_name2.q still is failing. It seesm getExternalScratchDir need 
to catch the IllegalParameterException as well.

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1118) Hive merge map files should have different bytes/mapper setting

2010-01-28 Thread Zheng Shao (JIRA)
Hive merge map files should have different bytes/mapper setting
---

 Key: HIVE-1118
 URL: https://issues.apache.org/jira/browse/HIVE-1118
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao


Currently, by default, we get one reducer for each 1GB of input data.
It's also true for the conditional merge job that will run if the average file 
size is smaller than a threshold.

This actually makes those job very slow, because each reducer needs to consume 
1GB of data.

Alternatively, we can just use that threshold to determine the number of 
reducers per job (or introduce a new parameter).
Let's say the threshold is 1MB, then we only start the the merge job if the 
average file size is less than 1MB, and the eventual result file size will be 
around 1MB (or another small number).

This will remove the extreme cases where we have thousands of empty files, but 
still make normal jobs fast enough.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1109) Structured temporary directories

2010-01-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1109:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

Commit. Thanks Zheng!

 Structured temporary directories
 

 Key: HIVE-1109
 URL: https://issues.apache.org/jira/browse/HIVE-1109
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1109.1.patch, HIVE-1109.2.patch, HIVE-1109.3.patch


 Currently Hive execution uses a lot of temporary directories. These 
 directories are NOT named by date or time, so it's impossible to know what 
 are the temporary directories (in case the query failed to clean up) that can 
 be deleted safely.
 We should have a better temporary directory structure, with the date and time 
 in the directory name.
 This will help a lot when we are able to resume a query that failed in the 
 middle, because we need to preserve the temporary directories for that query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1068) CREATE VIEW followup: add a table type enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views

2010-01-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806243#action_12806243
 ] 

Zheng Shao commented on HIVE-1068:
--

The meaning of isNativeSerDe(String) changed. Can we rename it to 
shouldGetColsFromSerDe (and invert the condition)?


 CREATE VIEW followup:  add a table type enum attribute in metastore's 
 MTable, and also null out irrelevant attributes for MTable instances which 
 describe views
 -

 Key: HIVE-1068
 URL: https://issues.apache.org/jira/browse/HIVE-1068
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1068.1.patch


 Zheng's description:
 5. TODO: Metadata change: We store view definitions in the same metadata 
 table that we store table definitions.
 Shall we add a field table type so we know whether it's a table, external 
 table, view, or materialized view in the future.
 We should clean up the additional useless fields in view - the test output 
 shows that we are storing some garbage information for views.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1092) Conditional task does not increase finished job counter when filter job out.

2010-01-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1092:
---

Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

 Conditional task does not increase finished job counter when filter job out.
 

 Key: HIVE-1092
 URL: https://issues.apache.org/jira/browse/HIVE-1092
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1092.patch


 Ended Job = 864272330, job is filtered out (removed at runtime).
 Launching Job 2 out of 3 
 It should be 'Launching Job 3 out of 3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Attachment: HIVE-1119.1.patch

This patch makes all tasks and works serializable.

Currently there are no additional tests.
Once the whole QueryPlan is serializable, we will serialize the whole query 
plan after compilation and deserialize the whole plan before execution. This 
will automatically test all the serialization/deserialization.

I prefer to commit this one first given that major code refactoring/clean-up is 
happening.


 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1119.1.patch


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Status: Patch Available  (was: Open)

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1119.1.patch


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1119) Make all Tasks and Works serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1119:
-

Attachment: HIVE-1119.1.patch

This patch makes all tasks and works serializable.
Currently there are no additional tests.
Once the whole QueryPlan is serializable, we will serialize the whole query 
plan after compilation and deserialize the whole plan before execution. This 
will automatically test all the serialization/deserialization.

I prefer to commit this one first given that major code refactoring/clean-up is 
happening.

 Make all Tasks and Works serializable
 -

 Key: HIVE-1119
 URL: https://issues.apache.org/jira/browse/HIVE-1119
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1119.1.patch


 All Tasks/Works (not just MapredTask and MapredWork) should be serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1119) Make all Tasks and Works serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1119:
-

Status: Patch Available  (was: Open)

 Make all Tasks and Works serializable
 -

 Key: HIVE-1119
 URL: https://issues.apache.org/jira/browse/HIVE-1119
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1119.1.patch


 All Tasks/Works (not just MapredTask and MapredWork) should be serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-01-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1117:
-

Attachment: (was: HIVE-1119.1.patch)

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.