date:20100223


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Attachment: HIVE-259-2.patch

Percentile function

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Attachment: Percentile.xlsx
jb2.txt

Percentile test file + validation using Excep Percentile function:
CREATE TABLE JB2
(
duration bigint,
code string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/jb2.txt' INTO TABLE JB2;



Result:
hive select percentile(duration,25,50,99) from JB2;
Ended Job = job_201002201654_0006
OK
[14.0,33.0,416.40001]
Time taken: 36.261 seconds

hive select code,percentile(duration,25,50,99) from JB2 group by code;
Ended Job = job_201002201654_0007
OK
a   [2.0,17.5,427.22999]
b   [22.75,44.5,345.849997]
c   [18.0,29.0,58.765]
Time taken: 23.419 seconds
hive quit;


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Status: Patch Available  (was: In Progress)

Percentile function.
Usage: select code,percentile(MyColumnB,P1,P2,P3,Px) from MyTable group 
by myColumn;

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.20 #197

2010-02-23 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/197/changes

Changes:

[zshao] HIVE-1190. Configure build to download Hadoop tarballs from Facebook 
mirror. (John Sichi via zshao)

[nzhang] HIVE-1188. NPE when running TestJdbcDriver/TestHiveServer

[zshao] HIVE-1185. Fix RCFile resource leak when opening a non-RCFile. (He 
Yongqiang via zshao)

--
[...truncated 13322 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit]

[jira] Created: (HIVE-1192) Build fails when hadoop.version=0.20.1

Build fails when hadoop.version=0.20.1
--

 Key: HIVE-1192
 URL: https://issues.apache.org/jira/browse/HIVE-1192
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach


Setting hadoop.version=0.20.1 causes the build to fail since
mirror.facebook.net/facebook/hive-deps does not have 0.20.1
(only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).

Suggested fix:
* remove/ignore the hadoop.version configuration parameter

or

* Remove the patch numbers from these archives and use only the major.minor 
numbers specified by the user to locate the appropriate tarball to download, so 
0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
* Optionally create new tarballs that only contain the components that are 
actually needed for the build (Hadoop jars), and remove things that aren't 
needed (all of the source files).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1


[ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837387#action_12837387
 ] 

John Sichi commented on HIVE-1192:
--

I'll get 0.20.1 added to the FB mirror.  Any others?

 Build fails when hadoop.version=0.20.1
 --

 Key: HIVE-1192
 URL: https://issues.apache.org/jira/browse/HIVE-1192
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach

 Setting hadoop.version=0.20.1 causes the build to fail since
 mirror.facebook.net/facebook/hive-deps does not have 0.20.1
 (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
 Suggested fix:
 * remove/ignore the hadoop.version configuration parameter
 or
 * Remove the patch numbers from these archives and use only the major.minor 
 numbers specified by the user to locate the appropriate tarball to download, 
 so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
 * Optionally create new tarballs that only contain the components that are 
 actually needed for the build (Hadoop jars), and remove things that aren't 
 needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1

[
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837395#action_12837395
]

Carl Steinbach commented on HIVE-1192:
--

bq. I'll get 0.20.1 added to the FB mirror. Any others?

Yes: 0.17.0, 0.17.1, 0.17.2, 0.18.1, 0.18.2, 0.19.1, and 0.19.2. I think that
anything listed [here|http://archive.apache.org/dist/hadoop/core/] with a minor
version = 17 is fair game.

What do you think of my suggestion that we repackage these tarballs using names
that drop the patch number?

Build fails when hadoop.version=0.20.1
--

Key: HIVE-1192
URL: https://issues.apache.org/jira/browse/HIVE-1192
Project: Hadoop Hive
Issue Type: Bug
Components: Build Infrastructure
Reporter: Carl Steinbach

Setting hadoop.version=0.20.1 causes the build to fail since
mirror.facebook.net/facebook/hive-deps does not have 0.20.1
(only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
Suggested fix:
* remove/ignore the hadoop.version configuration parameter
or
* Remove the patch numbers from these archives and use only the major.minor
numbers specified by the user to locate the appropriate tarball to download,
so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
* Optionally create new tarballs that only contain the components that are
actually needed for the build (Hadoop jars), and remove things that aren't
needed (all of the source files).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1


[ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837409#action_12837409
 ] 

John Sichi commented on HIVE-1192:
--

I don't think we should drop the patch number, since someone may want a 
specific version.  But we can improve the wiki docs about which versions can be 
obtained where and what to do if you don't care about patch number.


 Build fails when hadoop.version=0.20.1
 --

 Key: HIVE-1192
 URL: https://issues.apache.org/jira/browse/HIVE-1192
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach

 Setting hadoop.version=0.20.1 causes the build to fail since
 mirror.facebook.net/facebook/hive-deps does not have 0.20.1
 (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
 Suggested fix:
 * remove/ignore the hadoop.version configuration parameter
 or
 * Remove the patch numbers from these archives and use only the major.minor 
 numbers specified by the user to locate the appropriate tarball to download, 
 so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
 * Optionally create new tarballs that only contain the components that are 
 actually needed for the build (Hadoop jars), and remove things that aren't 
 needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: ivy/hadoop downloads status update

2010-02-23 Thread John Sichi

Update on this:

We got the new mirror.facebook.net/facebook/hive-deps archive set up, and I've 
tested it out and changed the build.properties configuration to use it by 
default (Zheng has already committed this to trunk).  If you have been 
experiencing Hadoop download problems from archive.apache.org, please give it a 
try and let us know if you still have trouble.

For now, we only included the specific Hadoop versions pulled when no Hadoop 
version is specified.  Carl pointed out in HIVE-1192 that this will cause a 
failure if you specify another version like 0.20.1.

If you encounter this situation, override build.properties to point 
hadoop.mirror to a mirror location which contains the version you want.  The 
only one that contains every version is archive.apache.org, but you should NOT 
use this unless you really need it for an old Hadoop version.  Instead, if it's 
one of the two recent Hadoop versions below, you can find it in any mirror 
(e.g. http://mirror.facebook.net/apache, which is separate from the hive-deps 
archive):

0.19.2
0.20.1

I'll update the wiki to have this information too.

In a week or so, we'll be provisioning more resources to 
mirror.facebook.net/facebook/hive-deps so that we can serve any commonly needed 
Hadoop version from there.

JVS

From: John Sichi
Sent: Thursday, February 18, 2010 11:26 AM
To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org
Subject: ivy/hadoop downloads status update

Hi all,

We're working on the problem in HIVE-984 which a number of people have been 
hitting, and with luck we'll have a resolution within the next week or so.  To 
address it, we're setting up mirror.facebook.nethttp://mirror.facebook.net to 
serve the Hadoop versions needed by Hive, with better availability and 
reliability than archive.apache.orghttp://archive.apache.org.  Once we've got 
that deployed and tested out, I'll submit a patch to build.properties to point 
hadoop.mirror to the new location.

For now, if you need to get unblocked, use the workaround documented by Carl in 
the comments of that JIRA issue:

http://tinyurl.com/yajydoy

Thanks for your patience,
JVS

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1

[
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837420#action_12837420
]

Carl Steinbach commented on HIVE-1192:
--

For this particular application I don't think the patch numbers matter. We
download
the Hadoop tarballs in order to obtain the jars which are referenced when the
shims are built in order to satisfy the compiler. Since the API does not change
between
patch versions there is no difference between using 0.20.1 and 0.20.0, etc for
this
step. Subsequently, the contents of these tarballs are not referenced at all --
we expect users to supply their own installation of Hadoop. I think it's a bug
that our build currently thinks there is a difference between 0.20.0 and 0.20.1.
Am I missing something?

Build fails when hadoop.version=0.20.1
--

Key: HIVE-1192
URL: https://issues.apache.org/jira/browse/HIVE-1192
Project: Hadoop Hive
Issue Type: Bug
Components: Build Infrastructure
Reporter: Carl Steinbach

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1


[ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837436#action_12837436
 ] 

John Sichi commented on HIVE-1192:
--

hadoop.version is used for running tests too (not just for building shims).

From test target in build-common.xml:

  env key=HADOOP_HOME value=${hadoop.root}/

From build.properties:

hadoop.version.ant-internal=${hadoop.version}
hadoop.root.default=${build.dir.hadoop}/hadoop-${hadoop.version.ant-internal}
hadoop.root=${hadoop.root.default}



 Build fails when hadoop.version=0.20.1
 --

 Key: HIVE-1192
 URL: https://issues.apache.org/jira/browse/HIVE-1192
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach

 Setting hadoop.version=0.20.1 causes the build to fail since
 mirror.facebook.net/facebook/hive-deps does not have 0.20.1
 (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
 Suggested fix:
 * remove/ignore the hadoop.version configuration parameter
 or
 * Remove the patch numbers from these archives and use only the major.minor 
 numbers specified by the user to locate the appropriate tarball to download, 
 so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
 * Optionally create new tarballs that only contain the components that are 
 actually needed for the build (Hadoop jars), and remove things that aren't 
 needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1


[ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837445#action_12837445
 ] 

John Sichi commented on HIVE-1192:
--

Wiki docs updated here:

http://wiki.apache.org/hadoop/Hive/HowToContribute#Hadoop_Dependencies


 Build fails when hadoop.version=0.20.1
 --

 Key: HIVE-1192
 URL: https://issues.apache.org/jira/browse/HIVE-1192
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach

 Setting hadoop.version=0.20.1 causes the build to fail since
 mirror.facebook.net/facebook/hive-deps does not have 0.20.1
 (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
 Suggested fix:
 * remove/ignore the hadoop.version configuration parameter
 or
 * Remove the patch numbers from these archives and use only the major.minor 
 numbers specified by the user to locate the appropriate tarball to download, 
 so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
 * Optionally create new tarballs that only contain the components that are 
 actually needed for the build (Hadoop jars), and remove things that aren't 
 needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1184) Expression Not In Group By Key error is sometimes masked

2010-02-23 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1184:
-

Status: Open  (was: Patch Available)

 Expression Not In Group By Key error is sometimes masked
 

 Key: HIVE-1184
 URL: https://issues.apache.org/jira/browse/HIVE-1184
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1184.1.patch


 Depending on the order of expressions, the error message for a expression not 
 in group key is not displayed; instead it is null.
 {code}
 hive select concat(value, concat(value)) from src group by concat(value);
 FAILED: Error in semantic analysis: null
 hive select concat(concat(value), value) from src group by concat(value);
 FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key 
 value
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1184) Expression Not In Group By Key error is sometimes masked

2010-02-23 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837476#action_12837476
 ] 

Zheng Shao commented on HIVE-1184:
--

The explanation looks good to me, but I am not convinced the solution will 
solve the problem.

When processing concat(value, concat(value)). we will set error when 
processing the first value, then overwrite the error when processing the 
second value, correct?
I think the error should be part of the return value of the process 
function, instead of a global field in the context.

Does that make sense?


 Expression Not In Group By Key error is sometimes masked
 

 Key: HIVE-1184
 URL: https://issues.apache.org/jira/browse/HIVE-1184
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1184.1.patch


 Depending on the order of expressions, the error message for a expression not 
 in group key is not displayed; instead it is null.
 {code}
 hive select concat(value, concat(value)) from src group by concat(value);
 FAILED: Error in semantic analysis: null
 hive select concat(concat(value), value) from src group by concat(value);
 FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key 
 value
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1189) Add package-info.java to Hive


[ 
https://issues.apache.org/jira/browse/HIVE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837483#action_12837483
 ] 

John Sichi commented on HIVE-1189:
--

Can you think of a way to add an automated negative test?

Regarding the check itself, you are comparing the version, but not the 
revision.  Don't we need to check both?



 Add package-info.java to Hive
 -

 Key: HIVE-1189
 URL: https://issues.apache.org/jira/browse/HIVE-1189
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1189.1.patch


 Hadoop automatically generates build/src/org/apache/hadoop/package-info.java 
 with information like this:
 {code}
 /*
  * Generated by src/saveVersion.sh
  */
 @HadoopVersionAnnotation(version=0.20.2-dev, revision=826568,
  user=zshao, date=Sun Oct 18 17:46:56 PDT 2009, 
 url=http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20;)
 package org.apache.hadoop;
 {code}
 Hive should do the same thing so that we can easily know the version of the 
 code at runtime.
 This will help us identify whether we are still running the same version of 
 Hive, if we serialize the plan and later continue the execution (See 
 HIVE-1100).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function


[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837500#action_12837500
 ] 

Carl Steinbach commented on HIVE-259:
-

Please fix the new Checkstyle errors in UDAFPercentile.java:

35: Missing a Javadoc comment.
39: Missing a Javadoc comment.
39:10: 'public' modifier out of order with the JLS suggestions.
41: Missing a Javadoc comment.
41:12: 'public' modifier out of order with the JLS suggestions.
42:15: Variable 'initDone' must be private and have accessor methods.
43:7: Declaring variables, return values or parameters of type 'HashMap' is not 
allowed.
43:35: Variable 'counts' must be private and have accessor methods.
44:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
44:26: Variable 'percentiles' must be private and have accessor methods.
47: Missing a Javadoc comment.
47:12: 'public' modifier out of order with the JLS suggestions.
56:11: Variable 'state' must be private and have accessor methods.
82:43: Name '_percentiles' must match pattern '^[a-z][a-zA-Z0-9]*$'.
85:28: Expression can be simplified.
105:39: ')' is preceded with whitespace.
117:26: Expression can be simplified.
125:65: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
129:12: Name 'CRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
130:12: Name 'FRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
164:12: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
173: Line is longer than 100 characters.
184:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
188:12: Name 'N' must match pattern '^[a-z][a-zA-Z0-9]*$'.
189:14: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
191:16: Name 'P' must match pattern '^[a-z][a-zA-Z0-9]*$'.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function


[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837522#action_12837522
 ] 

Jerome Boulon commented on HIVE-259:


@Carl: How did you get this list?

Also, I'm not sure to understand this: 

Why HashMap and ArrayList are not allowed if supported??

43:7: Declaring variables, return values or parameters of type 'HashMap' is not 
allowed.
44:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
164:12: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
184:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-23 Thread Alex Loddengaard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837526#action_12837526
 ] 

Alex Loddengaard commented on HIVE-259:
---

Hey Jerome,

I assume it's because you're supposed to use the interface type (e.g. Map or 
List) for return types, parameter types, and declaring variables.

Correct me if I'm wrong, those of you more knowledgeable about Hive's 
checkstyle :).

Alex

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function


[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837527#action_12837527
 ] 

Carl Steinbach commented on HIVE-259:
-

bq. How did you get this list? 

Run 'ant checkstyle'. The list of violations gets dumped to 
build/checkstyle/checkstyle-errors.txt.

bq. Why HashMap and ArrayList are not allowed if supported?

You're allowed to use ArrayList and HashMap, but you're supposed to refer
to instances of these classes using the interface (List or Map) instead of the
concrete type, e.g.

{code:java}
MapString, String myMap = new HashMapString, String();

public ListString getStringList() {
   return new ArrayListString();
}
{code}



 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1148) Add Checkstyle documentation to developer guide


 [ 
https://issues.apache.org/jira/browse/HIVE-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1148.
--

Resolution: Fixed

Added directions to the HowToContribute wiki page: 
http://wiki.apache.org/hadoop/Hive/HowToContribute


 Add Checkstyle documentation to developer guide
 ---

 Key: HIVE-1148
 URL: https://issues.apache.org/jira/browse/HIVE-1148
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach

 Add checkstyle documentation to the Hive developer manual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1098) Fix Eclipse launch configurations


 [ 
https://issues.apache.org/jira/browse/HIVE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1098.
--

Resolution: Duplicate

 Fix Eclipse launch configurations
 -

 Key: HIVE-1098
 URL: https://issues.apache.org/jira/browse/HIVE-1098
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.5.0, 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach

 All of the Eclipse launch configurations in eclipse-templates are currently 
 broken.
 The configurations reference hive_model.jar, which no longer exists, but there
 appear to be other problems as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-984) Building Hive occasionally fails with Ivy error: hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:


 [ 
https://issues.apache.org/jira/browse/HIVE-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-984.
-

Resolution: Duplicate

Fixed in HIVE-1190.

 Building Hive occasionally fails with Ivy error: 
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 ---

 Key: HIVE-984
 URL: https://issues.apache.org/jira/browse/HIVE-984
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-984.2.patch, HIVE-984.patch


 Folks keep running into this problem when building Hive from source:
 {noformat}
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]   hadoop-resolver: tried
 [ivy:retrieve]
 http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
 [ivy:retrieve]  ::
 [ivy:retrieve]  ::  FAILED DOWNLOADS::
 [ivy:retrieve]  :: ^ see resolution messages for details  ^ ::
 [ivy:retrieve]  ::
 [ivy:retrieve]  :: hadoop#core;0.20.1!hadoop.tar.gz(source)
 [ivy:retrieve]  ::
 [ivy:retrieve]
 [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 {noformat}
 The problem appears to be either with a) the Hive build scripts, b) ivy, or 
 c) archive.apache.org
 Besides fixing the actual bug, one other option worth considering is to add 
 the Hadoop jars to the
 Hive source repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1187) Implement ddldump utility for Hive Metastore

[
https://issues.apache.org/jira/browse/HIVE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837563#action_12837563
]

Carl Steinbach commented on HIVE-1187:
--

@Ed: I don't think there is much overlap between this enhancement and what is
described in HIVE-1161. I expect that HIVE-1161 will involve triggering the
transfer of metadata over Thrift or JDBC from one metastore to another, or via
an agent that accesses both metastores via the client APIs. Dumping DDL from
one metastore and replaying it on another seems like a hacky way to
synchronize/transmit metadata.

Implement ddldump utility for Hive Metastore

Key: HIVE-1187
URL: https://issues.apache.org/jira/browse/HIVE-1187
Project: Hadoop Hive
Issue Type: New Feature
Components: Metastore
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach

Implement a ddldump utility for the Hive metastore that will generate the QL
DDL necessary to recreate the state of the current metastore on another
metastore instance.
A major use case for this utility is migrating a metastore from one database
to another database, e.g. from an embedded Derby instanced to a MySQL
instance.
The ddldump utility should support the following features:
* Ability to generate DDL for specific tables or all tables.
* Ability to specify a table name prefix for the generated DDL, which will be
useful for resolving table name conflicts.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1161) Hive Replication


[ 
https://issues.apache.org/jira/browse/HIVE-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837565#action_12837565
 ] 

Carl Steinbach commented on HIVE-1161:
--

I hear that the folks at Facebook already have a system in production
that does something similar to what Ed described above. Can someone
from the FB team offer more details?


 Hive Replication
 

 Key: HIVE-1161
 URL: https://issues.apache.org/jira/browse/HIVE-1161
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Contrib
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor

 Users may want to replicate data between two distinct hadoop clusters or two 
 hive warehouses on the same cluster.
 Users may want to replicate entire catalogs or possibly on a table by table 
 basis. Should this process be batch driven or a be a full time running 
 application? What are some practical requirements, what are the limitations?
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1080) Test coverage for ExecDriver when running tests in local mode


 [ 
https://issues.apache.org/jira/browse/HIVE-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-1080:


Assignee: (was: Carl Steinbach)

 Test coverage for ExecDriver when running tests in local mode
 -

 Key: HIVE-1080
 URL: https://issues.apache.org/jira/browse/HIVE-1080
 Project: Hadoop Hive
  Issue Type: Test
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Carl Steinbach

 Add a distributed-mode test to the test suite in order to guarantee test 
 coverage of ExecDriver.
 Filing this ticket as a follow-up to HIVE-1064.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1137) build references IVY_HOME incorrectly


 [ 
https://issues.apache.org/jira/browse/HIVE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1137:
-

Attachment: HIVE-1137.patch

 build references IVY_HOME incorrectly
 -

 Key: HIVE-1137
 URL: https://issues.apache.org/jira/browse/HIVE-1137
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Carl Steinbach
 Fix For: 0.6.0

 Attachments: HIVE-1137.patch


 The build references env.IVY_HOME, but doesn't actually import env as it 
 should (via property environment=env/).
 It's not clear what the IVY_HOME reference is for since the build doesn't 
 even use ivy.home (instead, it installs under the build/ivy directory).
 It looks like someone copied bits and pieces from the Automatically section 
 here:
 http://ant.apache.org/ivy/history/latest-milestone/install.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1137) build references IVY_HOME incorrectly