date:20101026

[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-10-26 Thread Skye Berghel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Skye Berghel updated HIVE-1501:
---

Status: Patch Available  (was: Open)

> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Skye Berghel
> Fix For: 0.7.0
>
> Attachments: 1501.patch, 1501_with_tests.patch
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes

2010-10-26 Thread Russell Melick (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1497:
-

Attachment: hive-1497.p3.patch

Coding complete, but unit tests not added yet.  Does not deal with partitions 
either.

> support COMMENT clause on CREATE INDEX, and add new commands for 
> SHOW/DESCRIBE indexes
> --
>
> Key: HIVE-1497
> URL: https://issues.apache.org/jira/browse/HIVE-1497
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
> Attachments: hive-1497.p1.patch, hive-1497.p2.patch, 
> hive-1497.p3.patch
>
>
> We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into 
> account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-10-26 Thread Skye Berghel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Skye Berghel updated HIVE-1501:
---

Attachment: 1501_with_tests.patch

Adding a patch for 1501 that also includes tests.

> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Skye Berghel
> Fix For: 0.7.0
>
> Attachments: 1501.patch, 1501_with_tests.patch
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Attachment: 1498.2.patch

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Attachment: 1498.patch

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Attachment: (was: 1498.patch)

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Attachment: (was: 1498.patch)

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925263#action_12925263
 ] 

Namit Jain commented on HIVE-474:
-

running tests

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474-3.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Status: Patch Available  (was: Open)

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474-3.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Attachment: patch-474-3.txt

Patch is updated to trunk.

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474-3.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Status: Patch Available  (was: Open)

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-26 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1498:
---

Attachment: 1498.patch

New patch with ALTER IDXPROPERTIES stuff backed out.

> support IDXPROPERTIES on CREATE INDEX
> -
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Marquis Wang
> Fix For: 0.7.0
>
> Attachments: 1498.patch, 1498.patch, hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-10-26 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925238#action_12925238
 ] 

Carl Steinbach commented on HIVE-1526:
--

We discussed this at the contributors meeting yesterday. I'm going to rebase 
the patch,
modify it to use Thrift 0.5.0, and then make the patch easier to review by 
removing the
the thrift generated code. I plan to get to this sometime in the next couple of 
days.

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure, Clients
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Fix For: 0.7.0
>
> Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
> libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: release 0.6.0 wrapup

2010-10-26 Thread Carl Steinbach

>
>
> Carl, as release manager, can you send out the release announcement once
> everything is ready?  I'll be at ApacheCon US next week in Atlanta and will
> be spreading the word on the release there.
>

Will do!

[jira] Resolved: (HIVE-1723) The result of left semi join is not correct

2010-10-26 Thread Liyin Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1723.
--

  Resolution: Fixed
Release Note: This bug is resolved in Hive-1641

This bug is resolved in Hive-1641

> The result of left semi join is not correct
> ---
>
> Key: HIVE-1723
> URL: https://issues.apache.org/jira/browse/HIVE-1723
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> In the test case semijoin.q, there is a query:
> select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key 
> sort by a.key;
> I think this query will return a wrong result if table t1 is larger than 
> 25000 different keys
> To be simple, I tried a very similar query:
> select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join 
> test_semijoin b on a.key = b.key sort by a.key;
> The table of test_semijoin is like
> 0 0
> 1 1
> 2 2
> 3 3
> 4 4
> 5 5
> ......
> ...  
> 25000   25000
> 25001   25001
> ...  
> ...  
> 25999   25999
> 26000   26000
> So we can easily estimate the correct result of this query should be the same 
> keys from table test_semijoin itsel.
> Actually, the result is only part of that: only from 0 to 24544.
> 0
> 1
> 2
> ..
> ..
> 24543
> 24544

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1722) The result of the test case mapjoin1.q is not correct

2010-10-26 Thread Liyin Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1722.
--

  Resolution: Fixed
Release Note: This bug is resolved in Hive-1641

This bug is resolved in Hive-1641

> The result of  the test case mapjoin1.q is not correct
> --
>
> Key: HIVE-1722
> URL: https://issues.apache.org/jira/browse/HIVE-1722
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> In the test case mapjoin1.q :
> SELECT  /*+ MAPJOIN(b) */ sum(a.key) as sum_a FROM srcpart a JOIN src b ON 
> a.key = b.key where a.ds is not null;
> The current result in mapjoin1.q.out shows the result is 76260.0
> But actually, if user remove the map join hint, and run the query:
> SELECT  sum(a.key) as sum_a FROM srcpart a JOIN src b ON a.key = b.key where 
> a.ds is not null;
> The result is 1114788.0
> And I import these input data into mysql to test, and test result is also 
> 1114788.0.
> Obviously, the current result is not correct

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1754) Remove JDBM component from Map Join

2010-10-26 Thread Liyin Tang (JIRA)

Remove JDBM component from Map Join
---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


Right now, JDBM is the major performance bottleneck of performance.
With the growth of the small table, the PUT and GET operation will take most of 
execution time.

Map Join is designed to load the data of small table into memory. 
If the data is too large to hold in memory, then there is no need to use the 
map join strategy.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-10-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925230#action_12925230
 ] 

Siying Dong commented on HIVE-1750:
---

How about doing this (something is much more expensive but should be right):

We go throught the whole expression tree, for every node, we keep a vector of 
results. Each result is for one partition, being true, false or null.
When doing logical expression, we do logical expression for every vector. Every 
for any node, all the result of the element is all true or all false, we can 
replace it with the constant true or false, and potentially remove its parent 
logical operator.

Since we only replace nodes when we know the results for sure, this algorithm 
will guarantee to be correct.

> Remove Partition Filtering Conditions when Possible
> ---
>
> Key: HIVE-1750
> URL: https://issues.apache.org/jira/browse/HIVE-1750
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> For some simple queries, partition filtering constraints take 8% of CPU time 
> (now 16% since we filter twice) even if the result is always true. When 
> possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-10-26 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925229#action_12925229
 ] 

He Yongqiang commented on HIVE-1750:


Under a 'or', if we see a non-partitioning column, they can not be removed.
Otherwise, it can be removed.

> Remove Partition Filtering Conditions when Possible
> ---
>
> Key: HIVE-1750
> URL: https://issues.apache.org/jira/browse/HIVE-1750
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> For some simple queries, partition filtering constraints take 8% of CPU time 
> (now 16% since we filter twice) even if the result is always true. When 
> possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-10-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925225#action_12925225
 ] 

Siying Dong commented on HIVE-1750:
---

How about:
(ds=1 and c='1) or (ds=2 or ds=3)

 (ds=2 or ds=3) is moved?

> Remove Partition Filtering Conditions when Possible
> ---
>
> Key: HIVE-1750
> URL: https://issues.apache.org/jira/browse/HIVE-1750
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> For some simple queries, partition filtering constraints take 8% of CPU time 
> (now 16% since we filter twice) even if the result is always true. When 
> possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: svn move and INFRA-3036

2010-10-26 Thread Edward Capriolo

On Tue, Oct 26, 2010 at 3:10 PM, John Sichi  wrote:
> I'm starting on the svn move in a little bit.  Committers, please hold off on 
> further commits until you see an update on this.
>
> JVS
>
> On Oct 7, 2010, at 10:45 AM, Edward Capriolo wrote:
>
>> All,
>>
>> Part of the move to TLP will require us moving our SVN.
>> https://issues.apache.org/jira/browse/INFRA-3036
>> Infra is going to tackle item #2 soon.
>>
>> After creates the new svn, we need to do the svn mv's into it.
>>
>> Users will have to run for their workspaces:
>> 'svn switch https://svn.apache.org/repos/asf/hive/trunk .'
>>
>> @hive-dev. Once item #2 is completed we should schedule the SVN move.
>> We can do this without any help from infra. So we should schedule this
>> internally (hive-dev).
>>
>> Edward
>
>

Correction: Users should switch their workspace to:

svn switch http://svn.apache.org/repos/asf/hive/

[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-10-26 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925209#action_12925209
 ] 

John Sichi commented on HIVE-1750:
--

This is the same logic I have in IndexPredicateAnalyzer.analyzePredicate.  And 
it can be configured with the specific set of columns to allow.  So you might 
be able to reuse it as is.


> Remove Partition Filtering Conditions when Possible
> ---
>
> Key: HIVE-1750
> URL: https://issues.apache.org/jira/browse/HIVE-1750
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> For some simple queries, partition filtering constraints take 8% of CPU time 
> (now 16% since we filter twice) even if the result is always true. When 
> possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-10-26 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925201#action_12925201
 ] 

Namit Jain commented on HIVE-1750:
--

I think we can use the following rules:

If the predicates contain only ANDs, remove the predicate containing 
partitioning columns


In case of any UDF (including OR):
  Get the columns used as arguments

If all columns in the parameters are partitioning columns, we can remove the 
partitioning predicate.
If a column in the partitioning predicate contains a non-partitioning column, 
we cannot remove it,


for eg:


If the condition is:

ds=1 or ds=2

ds=1 and x=1

we can remove the conditions for partitioning columns.


However, if the condition is:

ds = 1 or x = 1

we cannot modify the condition.

We can go over all UDFs and add them in the category of UDFs which behave like 
AND.
Any unknowns behave like OR

> Remove Partition Filtering Conditions when Possible
> ---
>
> Key: HIVE-1750
> URL: https://issues.apache.org/jira/browse/HIVE-1750
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>
> For some simple queries, partition filtering constraints take 8% of CPU time 
> (now 16% since we filter twice) even if the result is always true. When 
> possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

release 0.6.0 wrapup

2010-10-26 Thread John Sichi

Since the release vote passed, I've gone ahead and moved the tag from candidate 
to release, and then copied the release binaries to the distribution directory. 
 According to the instructions, those will take about 24 hours to propagate to 
all mirrors.

Ed, I think you mentioned in IRC that you are going to work on updating 
http://hive.apache.org?Besides the releases/news, we should get rid of 
references to Hive as a Hadoop subproject.  (For the bar at the top, I checked 
Pig and HBase, and they still have Hadoop logo and nav menu, so I guess we can 
leave that for now.)  We should also update the mailing list addresses (but 
leave the wiki URL alone for now).  Let me know whether you're going to take 
care of these or you want me to.

Carl, as release manager, can you send out the release announcement once 
everything is ready?  I'll be at ApacheCon US next week in Atlanta and will be 
spreading the word on the release there.

JVS

Re: svn repository move complete

2010-10-26 Thread John Sichi

Thanks.  By the way, I saw an ASF announcement that there's now a review board 
instance available, so we should probably move there.

https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the

https://reviews.apache.org

JVS

On Oct 26, 2010, at 2:16 PM, Carl Steinbach wrote:

> I filed a request with ASF INFRA to update the Git mirror:
> https://issues.apache.org/jira/browse/INFRA-3107
> 
> Carl
> 
> On Tue, Oct 26, 2010 at 1:46 PM, John Sichi  wrote:
> 
>> If you have outstanding checkouts (including ones with changes) you can
>> update them using svn switch:
>> 
>> svn switch https://svn.apache.org/repos/asf/hive/trunk
>> 
>> The above assumes you have trunk checked out (with https for committing).
>> If you instead have a branch checked out, or are using http, then adjust
>> the URL accordingly.
>> 
>> I'll update the wiki etc with the new location.
>> 
>> JVS
>> 
>>

Re: svn repository move complete

2010-10-26 Thread Carl Steinbach

I filed a request with ASF INFRA to update the Git mirror:
https://issues.apache.org/jira/browse/INFRA-3107

Carl

On Tue, Oct 26, 2010 at 1:46 PM, John Sichi  wrote:

> If you have outstanding checkouts (including ones with changes) you can
> update them using svn switch:
>
> svn switch https://svn.apache.org/repos/asf/hive/trunk
>
> The above assumes you have trunk checked out (with https for committing).
>  If you instead have a branch checked out, or are using http, then adjust
> the URL accordingly.
>
> I'll update the wiki etc with the new location.
>
> JVS
>
>

svn repository move complete

2010-10-26 Thread John Sichi

If you have outstanding checkouts (including ones with changes) you can update 
them using svn switch:

svn switch https://svn.apache.org/repos/asf/hive/trunk

The above assumes you have trunk checked out (with https for committing).  If 
you instead have a branch checked out, or are using http, then adjust the URL 
accordingly.

I'll update the wiki etc with the new location.

JVS

[jira] Updated: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-10-26 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1326:
-

Component/s: Query Processor
Description: 
In our production hadoop environment, the "/tmp/" is actually pretty small, and 
we encountered a problem when a query used the RowContainer class and filled up 
the /tmp/ partition.  I tracked down the cause to the RowContainer class 
putting temporary files in the '/tmp/' path instead of using the configured 
Hadoop temporary path.  I've attached a patch to fix this.

Here's the traceback:

2010-04-25 12:05:05,120 INFO 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
rows: used memory = 385520312
2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
rows: used memory = 341780472
2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
rows: used memory = 301446768
2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
rows: used memory = 399208768
2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
rows: used memory = 364507216
2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
rows: used memory = 332907280
2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
rows: used memory = 298774096
2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
rows: used memory = 396505408
2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
rows: used memory = 362477288
2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
rows: used memory = 327229744
2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
rows: used memory = 296051904
2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
java.io.IOException: No space left on device
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
at 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
... 22 more


  was:

In our production hadoop environment, the "/tmp/" is actually pretty small, and 
we encountered a problem when a query used the RowContainer class and filled up 
the /tmp/ partition.  I tracked down the cause to the RowContainer class 
putting temporary files in the '/tmp/' path instead of using the configured 
Hadoop temporary path.  I've attached a patch to fix this.

Here's the traceback:

2010-04-25 12:05:05,120 INFO 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
temp file /tmp/hive-rowcontainer-1244151903/RowContainer78

[jira] Updated: (HIVE-1749) ExecMapper and ExecReducer: reduce function calls to l4j.isInfoEnabled()

2010-10-26 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1749:
-

Component/s: Query Processor
Summary: ExecMapper and ExecReducer: reduce function calls to 
l4j.isInfoEnabled()  (was: ExecMapper and ExecReducer reduce function calls to 
l4j.isInfoEnabled())

> ExecMapper and ExecReducer: reduce function calls to l4j.isInfoEnabled()
> 
>
> Key: HIVE-1749
> URL: https://issues.apache.org/jira/browse/HIVE-1749
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HIVE-1749.1.patch
>
>
> Calling l4j.isInfoEnabled() is more expensive than we thought. By eliminating 
> this function call, we can save 1% - 3% CPU time, according to the profiliing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1731) Improve miscellaneous error messages

2010-10-26 Thread Adam Kramer (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925098#action_12925098
 ] 

Adam Kramer commented on HIVE-1731:
---

>From a UNION ALL query:

FAILED: Error in semantic analysis: Schema of both sides of union should match: 
destinationid:_col1 _col2

...this should 1) provide a line number where the error is, 2) say how the 
schemata mismatch, and 3) use actual column names. destinationid is an actual 
column name, but I have no idea what _col1 and _col2 refer to.

When I have 10 UNION ALLs on top of each other, this error message is very 
aggravating.

> Improve miscellaneous error messages
> 
>
> Key: HIVE-1731
> URL: https://issues.apache.org/jira/browse/HIVE-1731
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: John Sichi
> Fix For: 0.7.0
>
>
> This is a place for accumulating error message improvements so that we can 
> update a bunch in batch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Hudson build is back to normal : Hive-trunk-h0.20 #404

2010-10-26 Thread Apache Hudson Server

See

Re: svn move and INFRA-3036

2010-10-26 Thread John Sichi

I'm starting on the svn move in a little bit.  Committers, please hold off on 
further commits until you see an update on this.

JVS

On Oct 7, 2010, at 10:45 AM, Edward Capriolo wrote:

> All,
> 
> Part of the move to TLP will require us moving our SVN.
> https://issues.apache.org/jira/browse/INFRA-3036
> Infra is going to tackle item #2 soon.
> 
> After creates the new svn, we need to do the svn mv's into it.
> 
> Users will have to run for their workspaces:
> 'svn switch https://svn.apache.org/repos/asf/hive/trunk .'
> 
> @hive-dev. Once item #2 is completed we should schedule the SVN move.
> We can do this without any help from infra. So we should schedule this
> internally (hive-dev).
> 
> Edward

Re: [VOTE] hive 0.6.0 release candidate 0

2010-10-26 Thread Ning Zhang

+1. all unit tests passed for me. 

On Oct 26, 2010, at 10:59 AM, Ashish Thusoo wrote:

> +1 from me as well. Ran the tests and aside from those that I mentioned 
> everything passed cleanly.
> 
> Ashish
> 
> From: Edward Capriolo [edlinuxg...@gmail.com]
> Sent: Monday, October 25, 2010 7:14 PM
> To: dev@hive.apache.org
> Subject: Re: [VOTE] hive 0.6.0 release candidate 0
> 
> On Mon, Oct 25, 2010 at 10:00 PM, John Sichi  wrote:
>> At the Hive contributor meeting, we discussed this and came to the 
>> conclusion that the failures reported so far are ignorable based on the 
>> Hudson history (and in this case Ning's observation of JVM inconsistencies 
>> with respect to serialization format).
>> 
>> We need one more +1 from a committer before we can release.
>> 
>> JVS
>> 
>> On Oct 25, 2010, at 12:54 PM, Ashish Thusoo wrote:
>> 
>>> I got the following test failures on the release candidate...
>>> 
>>> groupby2.q
>>> groupby3.q
>>> groupby4.q
>>> groupby5.q
>>> groupby6.q
>>> 
>>> not sure if this is just in my env or if others have seen this...
>>> 
>>> A sample of the diff is below and seems to be related to some plan ordering 
>>> or some change in plan. Is anyone else getting this?
>>> 
>>> Ashish
>>> 
>>> -
>>>   [junit] diff -b -I'\(\(>> class="java.beans.XMLDecoder">\)\|\(.*/tmp/.*\)\|\(file:.*\)\|\([0-9]\{10\}\)\|\(/.*/warehouse/.*\)\)'
>>>  
>>> /data/users/athusoo/tmp/hive-0.6.0/src/build/ql/test/logs/positive/groupby6.q.xml
>>>  
>>> /data/users/athusoo/tmp/hive-0.6.0/src/ql/src/test/results/compiler/plan/groupby6.q.xml
>>>   [junit] 352,353c352
>>>   [junit] <>> method="valueOf">
>>>   [junit] < 
>>> org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode
>>>   [junit] ---
>>>   [junit] >>> class="org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode" method="valueOf">
>>>   [junit] 878,879c877
>>>   [junit] <  
>>>   [junit] <   
>>> org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode
>>>   [junit] ---
>>>   [junit] >  >> class="org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode" method="valueOf">
>>> 
>>> --
>>> 
>>> 
>>> From: John Sichi [jsi...@facebook.com]
>>> Sent: Thursday, October 21, 2010 12:22 PM
>>> To: 
>>> Subject: Re: [VOTE] hive 0.6.0 release candidate 0
>>> 
>>> Yeah, the scripts should only be needed in configurations where JDO is told 
>>> not to automatically update the schema.  This is recommended for production 
>>> environments.
>>> 
>>> For this particular release, taking a downtime while running the scripts is 
>>> a good idea due to the nature of the changes (e.g. altering the primary key 
>>> on COLS).  That needn't be true in general for additive-only changes.
>>> 
>>> JVS
>>> 
>>> On Oct 21, 2010, at 12:14 PM, Edward Capriolo wrote:
>>> 
 On Wed, Oct 20, 2010 at 6:38 PM, John Sichi  wrote:
> The tarballs are at
> 
> http://people.apache.org/~jvs/hive-0.6.0-candidate-0
> 
> Carl did some sanity testing on it already, but any additional testing 
> you can do before voting helps to ensure a quality release.
> 
> JVS
> 
> 
 
 I am checking it out now. It seems like since i have used two trunk
 versions since hive the view related tables have already been created.
 I do not need the update script.
>>> 
>> 
>> 
> 
> I checked out. Created views, ran some queries on against them, tested
> the new local mode, web interface looks good. +1 Great work everyone.

[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-474:


Status: Open  (was: Patch Available)

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-26 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925082#action_12925082
 ] 

Namit Jain commented on HIVE-1672:
--

The patch failed to apply cleanly after HIVE-1641.

Can you regenerate the patch ?

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-26 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1672:
-

Status: Open  (was: Patch Available)

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925080#action_12925080
 ] 

Namit Jain commented on HIVE-474:
-

Can you refresh and regenerate the patch - I am getting some compile errors 
after applying to trunk ?


[javac] /data/users/njain/hive-commit1/ql/build.xml:159: warning: 
'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to 
false for repeatable builds
[javac] Compiling 622 source files to 
/data/users/njain/hive-commit1/build/ql/classes
[javac] 
/data/users/njain/hive-commit1/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:204:
 cannot find symbol
[javac] symbol  : class StructField
[javac] location: class org.apache.hadoop.hive.ql.exec.GroupByOperator
[javac] List sfs =
[javac]^
[javac] 
/data/users/njain/hive-commit1/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:205:
 cannot find symbol
[javac] symbol  : class StandardStructObjectInspector
[javac] location: class org.apache.hadoop.hive.ql.exec.GroupByOperator
[javac]   ((StandardStructObjectInspector) 
rowInspector).getAllStructFieldRefs();
[javac] ^
[javac] 
/data/users/njain/hive-commit1/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:207:
 cannot find symbol
[javac] symbol  : class StructField
[javac] location: class org.apache.hadoop.hive.ql.exec.GroupByOperator
[javac]   StructField keyField = sfs.get(0);
[javac]   ^
[javac] 
/data/users/njain/hive-commit1/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:211:
 cannot find symbol
[javac] symbol  : class StandardStructObjectInspector
[javac] location: class org.apache.hadoop.hive.ql.exec.GroupByOperator
[javac] if (keyObjInspector instanceof 
StandardStructObjectInspector) {
[javac]^



Most probably, some merge issue

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925081#action_12925081
 ] 

Namit Jain commented on HIVE-474:
-

+1

Otherwise, the changes look good

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925078#action_12925078
 ] 

He Yongqiang commented on HIVE-1753:


+1.

> HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
> -
>
> Key: HIVE-1753
> URL: https://issues.apache.org/jira/browse/HIVE-1753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
> Attachments: HIVE-1753.patch
>
>
> Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1753:
--

Assignee: Thiruvel Thirumoolan

> HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
> -
>
> Key: HIVE-1753
> URL: https://issues.apache.org/jira/browse/HIVE-1753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
> Attachments: HIVE-1753.patch
>
>
> Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1641) add map joined table to distributed cache

2010-10-26 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1641.


Resolution: Fixed

I just committed! Thanks Liyin!

> add map joined table to distributed cache
> -
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1641(3).txt, Hive-1641(4).patch, 
> Hive-1641(5).patch, Hive-1641.patch
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which 
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a 
> few thousand, due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read 
> from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1753:
---

Attachment: HIVE-1753.patch

> HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
> -
>
> Key: HIVE-1753
> URL: https://issues.apache.org/jira/browse/HIVE-1753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
> Attachments: HIVE-1753.patch
>
>
> Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1753:
---

Status: Patch Available  (was: Open)

> HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
> -
>
> Key: HIVE-1753
> URL: https://issues.apache.org/jira/browse/HIVE-1753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
> Attachments: HIVE-1753.patch
>
>
> Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925075#action_12925075
 ] 

Thiruvel Thirumoolan commented on HIVE-1753:


Sreekanth Ramakrishnan fixed it and the patch works for me. Will upload the 
patch.

> HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
> -
>
> Key: HIVE-1753
> URL: https://issues.apache.org/jira/browse/HIVE-1753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
>
> Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1753) HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat

2010-10-26 Thread Thiruvel Thirumoolan (JIRA)

HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
-

 Key: HIVE-1753
 URL: https://issues.apache.org/jira/browse/HIVE-1753
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.7.0


Errors are the same as HIVE-1633 but I see them for Stage-2 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1748) Statistics broken for tables with size in excess of Integer.MAX_VALUE

2010-10-26 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1748:


Assignee: Paul Butler

> Statistics broken for tables with size in excess of Integer.MAX_VALUE
> -
>
> Key: HIVE-1748
> URL: https://issues.apache.org/jira/browse/HIVE-1748
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Paul Butler
>Assignee: Paul Butler
> Fix For: 0.7.0
>
> Attachments: HIVE-1748.patch
>
>
> ANALYZE TABLE x COMPUTE STATISTICS would fail to update the table size if it 
> exceeded Integer.MAX_VALUE because it used parseInt instead of parseLong.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1748) Statistics broken for tables with size in excess of Integer.MAX_VALUE

2010-10-26 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1748.
--

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]

Committed. Thanks Paul

> Statistics broken for tables with size in excess of Integer.MAX_VALUE
> -
>
> Key: HIVE-1748
> URL: https://issues.apache.org/jira/browse/HIVE-1748
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Paul Butler
> Fix For: 0.7.0
>
> Attachments: HIVE-1748.patch
>
>
> ANALYZE TABLE x COMPUTE STATISTICS would fail to update the table size if it 
> exceeded Integer.MAX_VALUE because it used parseInt instead of parseLong.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1749) ExecMapper and ExecReducer reduce function calls to l4j.isInfoEnabled()

2010-10-26 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1749:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Siyng

> ExecMapper and ExecReducer reduce function calls to l4j.isInfoEnabled()
> ---
>
> Key: HIVE-1749
> URL: https://issues.apache.org/jira/browse/HIVE-1749
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HIVE-1749.1.patch
>
>
> Calling l4j.isInfoEnabled() is more expensive than we thought. By eliminating 
> this function call, we can save 1% - 3% CPU time, according to the profiliing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [VOTE] hive 0.6.0 release candidate 0

2010-10-26 Thread Ashish Thusoo

+1 from me as well. Ran the tests and aside from those that I mentioned 
everything passed cleanly.

Ashish

From: Edward Capriolo [edlinuxg...@gmail.com]
Sent: Monday, October 25, 2010 7:14 PM
To: dev@hive.apache.org
Subject: Re: [VOTE] hive 0.6.0 release candidate 0

On Mon, Oct 25, 2010 at 10:00 PM, John Sichi  wrote:
> At the Hive contributor meeting, we discussed this and came to the conclusion 
> that the failures reported so far are ignorable based on the Hudson history 
> (and in this case Ning's observation of JVM inconsistencies with respect to 
> serialization format).
>
> We need one more +1 from a committer before we can release.
>
> JVS
>
> On Oct 25, 2010, at 12:54 PM, Ashish Thusoo wrote:
>
>> I got the following test failures on the release candidate...
>>
>> groupby2.q
>> groupby3.q
>> groupby4.q
>> groupby5.q
>> groupby6.q
>>
>> not sure if this is just in my env or if others have seen this...
>>
>> A sample of the diff is below and seems to be related to some plan ordering 
>> or some change in plan. Is anyone else getting this?
>>
>> Ashish
>>
>> -
>>[junit] diff -b -I'\(\(> class="java.beans.XMLDecoder">\)\|\(.*/tmp/.*\)\|\(file:.*\)\|\([0-9]\{10\}\)\|\(/.*/warehouse/.*\)\)'
>>  
>> /data/users/athusoo/tmp/hive-0.6.0/src/build/ql/test/logs/positive/groupby6.q.xml
>>  
>> /data/users/athusoo/tmp/hive-0.6.0/src/ql/src/test/results/compiler/plan/groupby6.q.xml
>>[junit] 352,353c352
>>[junit] <> method="valueOf">
>>[junit] < 
>> org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode
>>[junit] ---
>>[junit] >> class="org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode" method="valueOf">
>>[junit] 878,879c877
>>[junit] <  
>>[junit] <   
>> org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode
>>[junit] ---
>>[junit] >  > class="org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode" method="valueOf">
>>
>> --
>>
>> 
>> From: John Sichi [jsi...@facebook.com]
>> Sent: Thursday, October 21, 2010 12:22 PM
>> To: 
>> Subject: Re: [VOTE] hive 0.6.0 release candidate 0
>>
>> Yeah, the scripts should only be needed in configurations where JDO is told 
>> not to automatically update the schema.  This is recommended for production 
>> environments.
>>
>> For this particular release, taking a downtime while running the scripts is 
>> a good idea due to the nature of the changes (e.g. altering the primary key 
>> on COLS).  That needn't be true in general for additive-only changes.
>>
>> JVS
>>
>> On Oct 21, 2010, at 12:14 PM, Edward Capriolo wrote:
>>
>>> On Wed, Oct 20, 2010 at 6:38 PM, John Sichi  wrote:
 The tarballs are at

 http://people.apache.org/~jvs/hive-0.6.0-candidate-0

 Carl did some sanity testing on it already, but any additional testing you 
 can do before voting helps to ensure a quality release.

 JVS


>>>
>>> I am checking it out now. It seems like since i have used two trunk
>>> versions since hive the view related tables have already been created.
>>> I do not need the update script.
>>
>
>

I checked out. Created views, ran some queries on against them, tested
the new local mode, web interface looks good. +1 Great work everyone.

[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-26 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1575:


Status: Open  (was: Patch Available)

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Mike Lewis
> Attachments: 
> 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-26 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925059#action_12925059
 ] 

Paul Yang commented on HIVE-1575:
-

It's hard to say, but my guess is that a regex will be slower than those string 
operations. Same thing with the cache. What might be good to do is compare the 
performance before and after these changes. Do you have a dataset that you 
could use to test?

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Mike Lewis
> Attachments: 
> 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-10-26 Thread Jeremy Hanna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925052#action_12925052
 ] 

Jeremy Hanna commented on HIVE-1434:


Any update to the status of this ticket?

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1641) add map joined table to distributed cache

2010-10-26 Thread Liyin Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925035#action_12925035
 ] 

Liyin Tang commented on HIVE-1641:
--

The patch without jdbm is also ready.
 shall I submit that patch?

> add map joined table to distributed cache
> -
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1641(3).txt, Hive-1641(4).patch, 
> Hive-1641(5).patch, Hive-1641.patch
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which 
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a 
> few thousand, due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read 
> from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-10-26 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925018#action_12925018
 ] 

Pradeep Kamath commented on HIVE-1526:
--

Any update on this? I would like to submit a patch for HIVE-1696 which depends 
on this and HIVE-842. Since this is currently broken against trunk, am waiting 
for the new patch (based off thrift-0.5 ?) so I can generate a patch for 
HIVE-1696.

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure, Clients
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Fix For: 0.7.0
>
> Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
> libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1583) Hive should not override Hadoop specific system properties

2010-10-26 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1583:
--

Status: Patch Available  (was: Open)

Making it Patch available.

> Hive should not override Hadoop specific system properties
> --
>
> Key: HIVE-1583
> URL: https://issues.apache.org/jira/browse/HIVE-1583
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Amareshwari Sriramadasu
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1583.patch, HIVE-1583_2.patch
>
>
> Currently Hive overrides Hadoop specific system properties such as 
> HADOOP_CLASSPATH.
> It does the following in bin/hive script :
> {code}
> # pass classpath to hadoop
> export HADOOP_CLASSPATH=${CLASSPATH}
> {code}
> Instead, It should honor the value of HADOOP_CLASSPATH set by client by 
> appending CLASSPATH to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Status: Patch Available  (was: Open)

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-26 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Attachment: patch-474-2.txt

bq. Not a good idea to ignore skew for multiple distincts.
I agree.

Updated patch throws error when there are multiple distincts with skew in data. 
Also, adds negative testcases.


> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474-1.txt, 
> patch-474-2.txt, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

57 matches

Mail list logo