[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897592#action_12897592
 ] 

John Sichi commented on HIVE-1495:
--

I got failures in two tests:

testCliDriver_index_compact
testCliDriver_protectmode



> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1495:
-

Status: Open  (was: Patch Available)

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-11 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897587#action_12897587
 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
---

For a table input3 with following data:
||key || value ||
|NULL |   35 |
|12 | NULL|
|10 | 1000 |
|10 | 100|
|100|100 |

The queries
{code}
 SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key < 
100);
and
 SELECT * FROM input3 a right outer JOIN input3 b ON (a.key=b.key AND b.key < 
100);
{code} 
produce the output as 
{noformat}
10  100010  100
10  100010  1000
10  100 10  100
10  100 10  1000
12  NULL12  NULL
{noformat}

Where as the expected output for 
"SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key < 
100);" is
| NULL |   35 | NULL | NULL |
|   10 | 1000 |   10 | 1000 |
|   10 | 1000 |   10 |  100 |
|  100 |  100 | NULL | NULL |
|   12 | NULL |   12 | NULL |
|   10 |  100 |   10 | 1000 |
|   10 |  100 |   10 |  100 |

 and expected output for "SELECT * FROM input3 a right outer JOIN input3 b ON 
(a.key=b.key AND b.key < 100);" is
| NULL | NULL | NULL |   35 |
|   10 | 1000 |   10 | 1000 |
|   10 |  100 |   10 | 1000 |
| NULL | NULL |  100 |  100 |
|   12 | NULL |   12 | NULL |
|   10 | 1000 |   10 |  100 |
|   10 |  100 |   10 |  100 |


> predicate pushdown does not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-11 Thread Amareshwari Sriramadasu (JIRA)
predicate pushdown does not work correctly with outer joins
---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu


The hive documentation for predicate pushdown says:
Left outer join: predicates on the left side aliases are pushed
Right outer join: predicates on the right side aliases are pushed

But, this pushdown should not happen for AND predicates in join queries:
ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Attachment: hive.1293.5.patch

new patch

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897577#action_12897577
 ] 

Namit Jain edited comment on HIVE-1293 at 8/12/10 12:51 AM:


Did the cleanups and changed default value of hive.support.concurrency to false

Not sure how can we set a default value for hive.zookeeper.client.port?


Let us do the lib cleanup in a follow-up - filed 
https://issues.apache.org/jira/browse/HIVE-1533

  was (Author: namit):
Did the cleanups and changed default value of hive.support.concurrency to 
false

Not sure how can we set a default value for hive.zookeeper.client.port?


Let us do the lib cleanup in a follow-up - I will file a jira
  
> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1533) Use ZooKeeper from maven

2010-08-11 Thread Namit Jain (JIRA)
Use ZooKeeper from maven


 Key: HIVE-1533
 URL: https://issues.apache.org/jira/browse/HIVE-1533
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Namit Jain


Zookeeper is now available from maven. Maybe we should delete the one in 
hbase-handler/lib and get it via ivy instead of adding it in the top-level lib? 
The version we have checked in is 3.2.2, but the maven availability is 3.3.x, 
so we'd need to test to make sure everything (including hbase-handler) still 
works with the newer version.
http://mvnrepository.com/artifact/org.apache.hadoop/zookeeper

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897577#action_12897577
 ] 

Namit Jain commented on HIVE-1293:
--

Did the cleanups and changed default value of hive.support.concurrency to false

Not sure how can we set a default value for hive.zookeeper.client.port?


Let us do the lib cleanup in a follow-up - I will file a jira

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1495:
-

Status: Patch Available  (was: Open)

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897568#action_12897568
 ] 

John Sichi commented on HIVE-1495:
--

Running tests now.


> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread Pierre Huyn (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897557#action_12897557
 ] 

Pierre Huyn commented on HIVE-1529:
---

Hi John,

Thanks for your help. Now that I am getting further, is there a recommended way 
to create and populate the table used in my .q test file? I look at the other 
.q files and many of them use "src" without defining it. Also, I don't see 
where the "src" is populated. Help!
Regards
--- Pierre



> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1495:
---

Attachment: hive-1495.4.patch

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1532) Replace globStatus with listStatus inside Hive.java's replaceFiles.

2010-08-11 Thread He Yongqiang (JIRA)
Replace globStatus with listStatus inside Hive.java's replaceFiles.
---

 Key: HIVE-1532
 URL: https://issues.apache.org/jira/browse/HIVE-1532
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


globStatus expects a regular expression,  so if there is special characters 
(like '{' , '[') in the filepath, this function will fail.

We should be able to replace this call with listStatus easily since we are not 
passing regex to replaceFiles(). The only places replaceFiles is called is in 
loadPartition and Table's replaceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897548#action_12897548
 ] 

John Sichi commented on HIVE-1293:
--

Two configuration questions:

* You have hive.support.concurrency=true in hive-default.xml.  Probably we want 
it false instead (only on during tests) since most people using Hive won't have 
a zookeeper quorum set up?

* Isn't there a default value we can use for hive.zookeeper.client.port?

One lib question:

* Zookeeper is now available from maven.  Maybe we should delete the one in 
hbase-handler/lib and get it via ivy instead of adding it in the top-level lib? 
 The version we have checked in is 3.2.2, but the maven availability is 3.3.x, 
so we'd need to test to make sure everything (including hbase-handler) still 
works with the newer version.

http://mvnrepository.com/artifact/org.apache.hadoop/zookeeper

Two cleanups:

* In QTestUtil.java, you left the following code commented out; can we get rid 
of it?

+  //  for (int i = 0; i < qfiles.length; i++) {
+  //qsetup[i].tearDown();
+  //  }

* In DDLTask.java, you left some commented-out debugging code (two instances):

+//console.printError("conflicting lock present " + tbl + " 
cannot be locked in mode " + mode);



> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1531:
-

Status: Patch Available  (was: Open)

> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-1531:


Assignee: Carl Steinbach

> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1531:
-

Attachment: HIVE-1531.patch.txt

HIVE-1531.patch.txt:

* Changed Ivy offline defaultTTL from "eternal" to "1000d"
* Verified that this change allows the Hive build to work with Ivy 2.0.0



> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1531:
-

Fix Version/s: 0.6.0

> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897537#action_12897537
 ] 

HBase Review Board commented on HIVE-1515:
--

Message from: "Paul Yang" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/598/#review853
---


Talked to Yongqiang offline about this one. The way that this patch attempts to 
fix the caching issue is to append some path information to the host so that we 
create a new HAR filesystem instance for different HAR files. The way that this 
is implemented now, a "-" and path information in added to the host e.g. 
har://hdfs-localhost-user--warehouse--mytable:50030... if the original were 
har://hdfs-localhost:50030. However, the HAR filesystem does not ignore the 
stuff after the second "-" and so has errors when trying to connect to the 
underlying filesystem. A possible fix would be to modify HiveHarFileSystem to 
extend the initialize() method so that the characters after the second "-" is 
ignored.

- Paul





> archive is not working when multiple partitions inside one table are archived.
> --
>
> Key: HIVE-1515
> URL: https://issues.apache.org/jira/browse/HIVE-1515
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1515.1.patch
>
>
> set hive.exec.compress.output = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size=256;
> set mapred.min.split.size.per.node=256;
> set mapred.min.split.size.per.rack=256;
> set mapred.max.split.size=256;
> set hive.archive.enabled = true;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="00");
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="001");
> select key, value, ds, hr from combine_3_srcpart_seq_rc where ds="2010-08-03" 
> order by key, hr limit 30;
> drop table combine_3_srcpart_seq_rc;
> will fail.
> java.io.IOException: Invalid file name: 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
>  in 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason it fails is because:
> there are 2 input paths (one for each partition) for the above query:
> 1): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
> 2): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
> But when doing path.getFileSystem() for these 2 input paths. they both return 
> same one file system instance which points the first caller, in this case 
> which is 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason here is Hadoop's FileSystem has a global cache, and when trying to 
> load a FileSystem instance from a given path, it only take the path's scheme 
> and username to lookup the cache. So when we do Path.getFileSystem for the 
> second har path, it actually returns the file system handle for the first 
> path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-11 Thread Carl Steinbach (JIRA)
Make Hive build work with Ivy versions < 2.1.0
--

 Key: HIVE-1531
 URL: https://issues.apache.org/jira/browse/HIVE-1531
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Carl Steinbach


Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop and 
Pig),
yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
many users
have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
copy will
always get picked up in preference to what the Hive build downloads for itself.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[DISCUSSION] Move to become a TLP

2010-08-11 Thread Ashish Thusoo
Folks,

This question has come up in the PMC once again and would be great to hear once 
more on this topic. What do people think? Are we ready to become a TLP?

Thanks,
Ashish

[jira] Updated: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-11 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1515:


Status: Open  (was: Patch Available)

See comments on reviewboard.

> archive is not working when multiple partitions inside one table are archived.
> --
>
> Key: HIVE-1515
> URL: https://issues.apache.org/jira/browse/HIVE-1515
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1515.1.patch
>
>
> set hive.exec.compress.output = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size=256;
> set mapred.min.split.size.per.node=256;
> set mapred.min.split.size.per.rack=256;
> set mapred.max.split.size=256;
> set hive.archive.enabled = true;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="00");
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="001");
> select key, value, ds, hr from combine_3_srcpart_seq_rc where ds="2010-08-03" 
> order by key, hr limit 30;
> drop table combine_3_srcpart_seq_rc;
> will fail.
> java.io.IOException: Invalid file name: 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
>  in 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason it fails is because:
> there are 2 input paths (one for each partition) for the above query:
> 1): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
> 2): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
> But when doing path.getFileSystem() for these 2 input paths. they both return 
> same one file system instance which points the first caller, in this case 
> which is 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason here is Hadoop's FileSystem has a global cache, and when trying to 
> load a FileSystem instance from a given path, it only take the path's scheme 
> and username to lookup the cache. So when we do Path.getFileSystem for the 
> second har path, it actually returns the file system handle for the first 
> path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897521#action_12897521
 ] 

John Sichi commented on HIVE-1495:
--

Oops, actually, the interface method should be declaring the new parameters as 
Set<> rather than HashSet<>.

Also, for the Javadoc, we should make it clear that the handler implementation 
is supposed to add information to these sets as a side effect of the call (i.e. 
they are supplemental outputs going along with the return value).  

While we're at it:

* move the new parameters to be last in the method signature (and Javadoc) to 
distinguish them from the read-only parameters

* can we get rid of the db parameter altogether?  the only thing it is used for 
in CompactIndexHandler is to get the configuration, and we already have that 
from setConf

Sorry for not noticing these earlier; could you give me one more patch?


> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897511#action_12897511
 ] 

John Sichi commented on HIVE-1495:
--

+1.  Will commit when tests pass.


> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897510#action_12897510
 ] 

John Sichi commented on HIVE-1529:
--

Run ant test with -Doverwrite=true


> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread Pierre Huyn (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897502#action_12897502
 ] 

Pierre Huyn commented on HIVE-1529:
---

Hi John,

Now that I have created the .q test file, how do I generate the corresponding 
.q.out file? I assume the latter is needed by "ant test". Thanks.

Regards
--- Pierre
 



> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897494#action_12897494
 ] 

He Yongqiang commented on HIVE-1495:


Thanks for the comments, John. Uploaded a new patch.

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1495:
---

Attachment: hive-1495.3.patch

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-11 Thread Carl Steinbach (JIRA)
Include hive-default.xml and hive-log4j.properties in hive-common JAR
-

 Key: HIVE-1530
 URL: https://issues.apache.org/jira/browse/HIVE-1530
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Carl Steinbach


hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
hive-default.xml file that currently sits in the conf/ directory should be 
removed.

Motivations for this change:
* We explicitly tell users that they should never modify hive-default.xml yet 
give them the opportunity to do so by placing the file in the conf dir.
* Many users are familiar with the Hadoop configuration mechanism that does not 
require *-default.xml files to be present in the HADOOP_CONF_DIR, and assume 
that the same is true for HIVE_CONF_DIR.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897464#action_12897464
 ] 

John Sichi edited comment on HIVE-1495 at 8/11/10 5:30 PM:
---

* Can you add Javadoc text for the new inputs/outputs @params to 
HiveIndexHandler.generateIndexBuildTaskList (and move them below the @param for 
baseTbl)?  It's important for interface methods to be documented as precisely 
as possible.

* You are missing the non-partitioned case (see the output for the example I 
gave above in index_compact_1.q.out, which is still missing input/output).

* I think driver.compile should already be constructing read/write entities for 
the reentrant INSERT.  You can access them as 
driver.getQueryPlan().getInputs/Outputs() and copy them into the sets passed in 
by the caller (rather than constructing new ones explicitly).  This would be 
more robust.





  was (Author: jvs):
* Can you add Javadoc text for the new inputs/outputs @params to 
HiveIndexHandler.generateIndexBuildTaskList (and move them below )?  It's 
important for interface methods to be documented as precisely as possible.

* You are missing the non-partitioned case (see the output for the example I 
gave above in index_compact_1.q.out, which is still missing input/output).

* I think driver.compile should already be constructing read/write entities for 
the reentrant INSERT.  You can access them as 
driver.getQueryPlan().getInputs/Outputs() and copy them into the sets passed in 
by the caller (rather than constructing new ones explicitly).  This would be 
more robust.




  
> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897464#action_12897464
 ] 

John Sichi commented on HIVE-1495:
--

* Can you add Javadoc text for the new inputs/outputs @params to 
HiveIndexHandler.generateIndexBuildTaskList (and move them below )?  It's 
important for interface methods to be documented as precisely as possible.

* You are missing the non-partitioned case (see the output for the example I 
gave above in index_compact_1.q.out, which is still missing input/output).

* I think driver.compile should already be constructing read/write entities for 
the reentrant INSERT.  You can access them as 
driver.getQueryPlan().getInputs/Outputs() and copy them into the sets passed in 
by the caller (rather than constructing new ones explicitly).  This would be 
more robust.





> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1495:
-

Status: Open  (was: Patch Available)

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread Pierre Huyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Affects Version/s: 0.7.0
   (was: 0.6.0)

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1529:


Assignee: Pierre Huyn

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1529:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-675:


Status: Open  (was: Patch Available)

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-7-16.patch.txt, 
> HIVE-675-2010-8-4.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread Pierre Huyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--


Work is currently under way. Actually the code is done and tested. I am going 
doing the checklist, moving toward patch submit. Since this is my first 
assignment, I am not familiar with the process and it may take a little longer 
to get to the submission point.

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Pierre Huyn
> Fix For: 0.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-11 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Attachment: HIVE-1518.1.patch

-- abstracted n-gram estimation heuristic into a new class, refactored ngrams()
-- added context_ngrams()
-- new tests for both ngrams() and context_ngrams()

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-11 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Status: Patch Available  (was: Open)

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-11 Thread Pierre Huyn (JIRA)
Add covariance aggregate function covar_pop and covar_samp
--

 Key: HIVE-1529
 URL: https://issues.apache.org/jira/browse/HIVE-1529
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Pierre Huyn
 Fix For: 0.6.0


Create new built-in aggregate functions covar_pop and covar_samp, functions 
commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.20 #341

2010-08-11 Thread Apache Hudson Server
See 




[jira] Created: (HIVE-1528) JSON UDTF function

2010-08-11 Thread Ning Zhang (JIRA)
JSON UDTF function
--

 Key: HIVE-1528
 URL: https://issues.apache.org/jira/browse/HIVE-1528
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


Currently the only way to evaluate a path expression on a JSON object is 
through get_json_object. If there are many fields in the JSON object need to be 
extract, we have to call this UDF multiple times. 

There are many use cases that get_json_object needs to be called many times in 
one query to convert the JSON object to a relational schema. It would be much 
desirable if we have a JSON UDTF that supports the following syntax:

{code}
select a.id, b.*
from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
as f1, f2, ..., fn
{code}

where the json_table function only scans the json_object once and return a set 
of tuple of (f1, f2,..., fn). 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #518

2010-08-11 Thread Apache Hudson Server
See 

Changes:

[jssarma] HIVE-1524. Parallel Execution fails if mapred.job.name is set
(Ning Zhang via jssarma)

[namit] HIVE-1514. API to change fileformat and location of a partition
(He Yongqiang via namit)

--
[...truncated 13491 lines...]
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 


[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897275#action_12897275
 ] 

HBase Review Board commented on HIVE-675:
-

Message from: "namit jain" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/508/#review838
---



metastore/if/hive_metastore.thrift


Same as John's earlier comment - do we need
to change the name.

No-one is using this table in the metastore currently,
so it might be OK - but not sure, how JDO would 
handle renaming a column



ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g


remove KW_SCHEMAS



ql/src/test/queries/clientpositive/database.q


also add a test for 'show databases for a pattern'


- namit





> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-7-16.patch.txt, 
> HIVE-675-2010-8-4.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-11 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897272#action_12897272
 ] 

Namit Jain commented on HIVE-675:
-

comments posted on http://review.cloudera.org/r/508/diff

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-7-16.patch.txt, 
> HIVE-675-2010-8-4.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-08-11 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Attachment: hive.1293.4.patch

minor bug

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache "swallows" cause exception when trowing IOExcpetion

2010-08-11 Thread Vladimir Klimontovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Klimontovich updated HIVE-1203:


Status: Patch Available  (was: Open)

> HiveInputFormat.getInputFormatFromCache "swallows"  cause exception when 
> trowing IOExcpetion
> 
>
> Key: HIVE-1203
> URL: https://issues.apache.org/jira/browse/HIVE-1203
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.4.1, 0.4.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
> Attachments: 0.4.patch, 0.5.patch, trunk.patch
>
>
> To fix this it's simply needed to add second parameter to IOException 
> constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.