[jira] Updated: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-02-25 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1198:
---

Status: Patch Available  (was: Open)

This patch address this issue as follows:

1.  The .checkstyle file's match pattern has been changed. This is done to 
exclude top-level ant directory from being included in the checkstyle 
application path. Since this is not a valid source directory, it causes 
checkstyle to report errors for all matching files within it.

2. The build file - build.xml has been modified to exclude {{ant}} directory 
from processing under the checkstyle target for the same reason as above.

3. The {{eclipse-templates/.project}} file has been modified to include the 
{{CheckstyleBuilder}} and {{CheckstyleNature}}. This would automatically enable 
checkstyle when the project is imported in Eclipse. I tested out the scenario 
of importing the project in Eclipse without having the checkstyle plugin 
installed - it did not seem to cause any noticeable problem.

4. The section on {{Developing Hive using Eclipse}} in the README.txt has been 
updated to state that the user must install Checkstyle plugin in eclipse if not 
already present before importing the project.

 When checkstyle is activated for Hive in Eclipse environment, it shows all 
 checkstyle problems as errors.
 -

 Key: HIVE-1198
 URL: https://issues.apache.org/jira/browse/HIVE-1198
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
 Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
Reporter: Arvind Prabhakar
Priority: Minor
 Attachments: HIVE-1198.patch


 As of now, checkstyle plugin reports all problems as errors. This causes an 
 overwhelming number of errors to show up (3000+) which masks real errors that 
 might be there. Since all the checkstyle violations are not going to be fixed 
 in one shot, it is desirable to lower the severity of checkstyle violations 
 to warnings so that the plugin can be kept enabled. This will encourage 
 developers to spot checkstyle violations in the files they touch and 
 potentially fix them as they go along, along with pointing out violations as 
 they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-02-25 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838508#action_12838508
 ] 

Arvind Prabhakar commented on HIVE-1198:


Missed out a change description in my previous comment:

5. Modified the {{checkstyle/checkstyle.xml}} to set the default value of 
{{severity}} as {{warning}}.

 When checkstyle is activated for Hive in Eclipse environment, it shows all 
 checkstyle problems as errors.
 -

 Key: HIVE-1198
 URL: https://issues.apache.org/jira/browse/HIVE-1198
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
 Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
Reporter: Arvind Prabhakar
Priority: Minor
 Attachments: HIVE-1198.patch


 As of now, checkstyle plugin reports all problems as errors. This causes an 
 overwhelming number of errors to show up (3000+) which masks real errors that 
 might be there. Since all the checkstyle violations are not going to be fixed 
 in one shot, it is desirable to lower the severity of checkstyle violations 
 to warnings so that the plugin can be kept enabled. This will encourage 
 developers to spot checkstyle violations in the files they touch and 
 potentially fix them as they go along, along with pointing out violations as 
 they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838512#action_12838512
 ] 

Jerome Boulon commented on HIVE-259:


Can someone explain how can I create/populate a new table to be used by the ant 
test target?


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hive unit-test table

2010-02-25 Thread Jerome Boulon
Can someone explain how can I create/populate a new table to be used by the
ant test target? 
Thanks in advance,
  /Jerome.


Hudson build is back to normal : Hive-trunk-h0.20 #199

2010-02-25 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/199/changes




[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838516#action_12838516
 ] 

Carl Steinbach commented on HIVE-259:
-

@Jerome: take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive unit-test table

2010-02-25 Thread Carl Steinbach
Take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

Carl

On Thu, Feb 25, 2010 at 12:03 PM, Jerome Boulon jbou...@netflix.com wrote:

 Can someone explain how can I create/populate a new table to be used by the
 ant test target?
 Thanks in advance,
   /Jerome.



[jira] Updated: (HIVE-1137) build references IVY_HOME incorrectly

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1137:
-

  Resolution: Fixed
Release Note: HIVE-1137. Fix build.xml for references to IVY_HOME. (Carl 
Steinbach via zshao)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Carl!

 build references IVY_HOME incorrectly
 -

 Key: HIVE-1137
 URL: https://issues.apache.org/jira/browse/HIVE-1137
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Carl Steinbach
 Fix For: 0.6.0

 Attachments: HIVE-1137.patch


 The build references env.IVY_HOME, but doesn't actually import env as it 
 should (via property environment=env/).
 It's not clear what the IVY_HOME reference is for since the build doesn't 
 even use ivy.home (instead, it installs under the build/ivy directory).
 It looks like someone copied bits and pieces from the Automatically section 
 here:
 http://ant.apache.org/ivy/history/latest-milestone/install.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1197) create a new input format where a mapper spans a file

2010-02-25 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1197:


Assignee: Siying Dong  (was: Namit Jain)

 create a new input format where a mapper spans a file
 -

 Key: HIVE-1197
 URL: https://issues.apache.org/jira/browse/HIVE-1197
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.6.0


 This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1193:


Assignee: Namit Jain

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1193:
-

Status: Patch Available  (was: Open)

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1193:
-

Attachment: hive.1193.1.patch

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1032) Better Error Messages for Execution Errors

2010-02-25 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1032:


Attachment: HIVE-1032.6.patch

* Fixed checkstyle issues

 Better Error Messages for Execution Errors
 --

 Key: HIVE-1032
 URL: https://issues.apache.org/jira/browse/HIVE-1032
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, 
 HIVE-1032.4.patch, HIVE-1032.5.patch, HIVE-1032.6.patch


 Three common errors that occur during execution are:
 1. Map-side group-by causing an out of memory exception due to large 
 aggregation hash tables
 2. ScriptOperator failing due to the user's script throwing an exception or 
 otherwise returning a non-zero error code
 3. Incorrectly specifying the join order of small and large tables, causing 
 the large table to be loaded into memory and producing an out of memory 
 exception.
 These errors are typically discovered by manually examining the error log 
 files of the failed task. This task proposes to create a feature that would 
 automatically read the error logs and output a probable cause and solution to 
 the command line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1032) Better Error Messages for Execution Errors

2010-02-25 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1032:


Status: Patch Available  (was: Open)

 Better Error Messages for Execution Errors
 --

 Key: HIVE-1032
 URL: https://issues.apache.org/jira/browse/HIVE-1032
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, 
 HIVE-1032.4.patch, HIVE-1032.5.patch, HIVE-1032.6.patch


 Three common errors that occur during execution are:
 1. Map-side group-by causing an out of memory exception due to large 
 aggregation hash tables
 2. ScriptOperator failing due to the user's script throwing an exception or 
 otherwise returning a non-zero error code
 3. Incorrectly specifying the join order of small and large tables, causing 
 the large table to be loaded into memory and producing an out of memory 
 exception.
 These errors are typically discovered by manually examining the error log 
 files of the failed task. This task proposes to create a feature that would 
 automatically read the error logs and output a probable cause and solution to 
 the command line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)
Fix CombineHiveInputFormat to work with multi-level of directories in a single 
table/partition
--

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


The CombineHiveInputFormat does not work with multi-level of directories in a 
single table/partition, because it uses an exact match logic, instead of the 
relativize logic as in MapOperator

{code}
MapOperator.java:
  if (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) 
{
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1032) Better Error Messages for Execution Errors

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1032:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1032. Better Error Messages for Execution Errors. (Paul 
Yang via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Paul!

 Better Error Messages for Execution Errors
 --

 Key: HIVE-1032
 URL: https://issues.apache.org/jira/browse/HIVE-1032
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0

 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, 
 HIVE-1032.4.patch, HIVE-1032.5.patch, HIVE-1032.6.patch


 Three common errors that occur during execution are:
 1. Map-side group-by causing an out of memory exception due to large 
 aggregation hash tables
 2. ScriptOperator failing due to the user's script throwing an exception or 
 otherwise returning a non-zero error code
 3. Incorrectly specifying the join order of small and large tables, causing 
 the large table to be loaded into memory and producing an out of memory 
 exception.
 These errors are typically discovered by manually examining the error log 
 files of the failed task. This task proposes to create a feature that would 
 automatically read the error logs and output a probable cause and solution to 
 the command line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838628#action_12838628
 ] 

He Yongqiang commented on HIVE-1193:


Looks good. Will test.

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1201) Add a python command-line interface for Hive

2010-02-25 Thread Zheng Shao (JIRA)
Add a python command-line interface for Hive


 Key: HIVE-1201
 URL: https://issues.apache.org/jira/browse/HIVE-1201
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Venky Iyer


Venky has a nice python command-line interface for Hive. It uses thrift API to 
talk with metastore. It uses hadoop command line to submit jobs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-02-25 Thread Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838683#action_12838683
 ] 

Liu commented on HIVE-474:
--

We have implemented this feature using union type, as metioned as A2 by Zheng.

 Support for distinct selection on two or more columns
 -

 Key: HIVE-474
 URL: https://issues.apache.org/jira/browse/HIVE-474
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Alexis Rondeau

 The ability to select distinct several, individual columns as by example: 
 select count(distinct user), count(distinct session) from actions;   
 Currently returns the following failure: 
 FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
 not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1193:
---

  Resolution: Fixed
Release Note: HIVE-1193. ensure sorting properties for a table.
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed! Thanks Namit!

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-02-25 Thread jian yi
Hi Liu,

How to implement to support for distinct selection on two or more columns?

Regards
Jian

2010/2/26 Liu (JIRA) j...@apache.org


[
 https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838683#action_12838683]

 Liu commented on HIVE-474:
 --

 We have implemented this feature using union type, as metioned as A2 by
 Zheng.

  Support for distinct selection on two or more columns
  -
 
  Key: HIVE-474
  URL: https://issues.apache.org/jira/browse/HIVE-474
  Project: Hadoop Hive
   Issue Type: Improvement
   Components: Query Processor
 Reporter: Alexis Rondeau
 
  The ability to select distinct several, individual columns as by example:
  select count(distinct user), count(distinct session) from actions;
  Currently returns the following failure:
  FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different
 Columns not Supported user

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




-- 
Hadoop Forum: http://bbs.hadoopor.com


[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1200:
-

Attachment: HIVE-1200.1.branch-0.5.patch
HIVE-1200.1.patch

 Fix CombineHiveInputFormat to work with multi-level of directories in a 
 single table/partition
 --

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch


 The CombineHiveInputFormat does not work with multi-level of directories in a 
 single table/partition, because it uses an exact match logic, instead of the 
 relativize logic as in MapOperator
 {code}
 MapOperator.java:
   if 
 (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) {
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1200:
-

Status: Patch Available  (was: Open)

 Fix CombineHiveInputFormat to work with multi-level of directories in a 
 single table/partition
 --

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch


 The CombineHiveInputFormat does not work with multi-level of directories in a 
 single table/partition, because it uses an exact match logic, instead of the 
 relativize logic as in MapOperator
 {code}
 MapOperator.java:
   if 
 (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) {
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838718#action_12838718
 ] 

Zheng Shao commented on HIVE-259:
-

Hi Jerome, using ArrayListInteger won't cause unnecessary Object creation. We 
will just create a single ArrayListInteger and use it forever.
Does that make sense?


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838735#action_12838735
 ] 

Todd Lipcon commented on HIVE-259:
--

Doesn't the autoboxing of Integer types actually allocate objects? I think JVM 
only flyweights integers for very small ones (iirc only from -127 to 128)

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838737#action_12838737
 ] 

Zheng Shao commented on HIVE-1193:
--

Can we have some more description on the JIRA?
The patch contains 2 properties: enforceBucketing and enforceSorting. But I 
don't see it from the JIRA.

1. How do we make sure that the data is bucketed / sorted? By adding an 
additional map-reduce job?
2. What if the user already specified CLUSTER BY key in his query?
3. Do we disable merging of small files when we do this?


 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1202) Unknown exception : null while join

2010-02-25 Thread Mafish (JIRA)
Unknown exception : null while join
-

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1



Hive throws Unknown exception : null with query:

select * from 
(
  select name from classes 
) a
  join classes b
where a.name  b.number

After tracing the code, I found this bug will occur with following
conditions:
1. It is join operation.
2. At least one of the source of join is physical table (right side in
above case).
3. With where condition and condition(s) of where clause must include
columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-25 Thread Mafish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838743#action_12838743
 ] 

Mafish commented on HIVE-1202:
--

The call stack is:

org.apache.hadoop.hive.ql.session.SessionState$LogHelper.printError(SessionState.java:279)
 - FAILED: Unknown exception : null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.QBMetaData.getTableForAlias(QBMetaData.java:76)
at 
org.apache.hadoop.hive.ql.parse.ASTPartitionPruner.getTableColumnDesc(ASTPartitionPruner.java:298)
at 
org.apache.hadoop.hive.ql.parse.ASTPartitionPruner.genExprNodeDesc(ASTPartitionPruner.java:220)
at 
org.apache.hadoop.hive.ql.parse.ASTPartitionPruner.genExprNodeDesc(ASTPartitionPruner.java:234)
at 
org.apache.hadoop.hive.ql.parse.ASTPartitionPruner.genExprNodeDesc(ASTPartitionPruner.java:234)
at 
org.apache.hadoop.hive.ql.parse.ASTPartitionPruner.addExpression(ASTPartitionPruner.java:397)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPartitionPruners(SemanticAnalyzer.java:624)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4440)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

This bug occurs while hive tries to prune table b, but it takes columns in 
where clauses. Bug there also exists columns of table a.  Thus, hive fails to 
find column name in table b.

 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.