date:20100615

[jira] Updated: (HIVE-1179) Add UDF array_contains

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1179:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179-3.patch, 
> HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1410) Add TCP keepalive option for the metastore server

2010-06-15 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1410:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Paul!

> Add TCP keepalive option for the metastore server
> -
>
> Key: HIVE-1410
> URL: https://issues.apache.org/jira/browse/HIVE-1410
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1410.1.patch
>
>
> In production, we have noticed that the metastore server tends to accumulate 
> half-open TCP connections when it has been running for a long time. By 
> half-open, I am referring to idle connections where there is an established 
> TCP connection on the metastore server machine, but no corresponding 
> connection on the client machine. This could be the result of network 
> disconnects or crashed clients.
> This patch will add an option to turn on TCP keepalive so that these 
> half-open connections will get cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Status: Patch Available  (was: Open)

> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: jdom-1.1.LICENSE

> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1135-3-patch.txt

Fixed all items.

> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1397:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Mayank!


> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch, 
> HIVE-1397.2.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-802:


Fix Version/s: (was: 0.6.0)

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-802.
-

Resolution: Duplicate

Fix incorporated in HIVE-1176

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

branching for 0.6

2010-06-15 Thread John Sichi

Hi all,

Ashish, Namit, Paul, Ning, Yongqiang and I met with Carl Steinbach today to 
kick off some planning for the 0.6.0 release of Hive.  As noted by Ashish in an 
earlier message, Carl will be driving the release management for 0.6, so please 
give him any support he needs to get out a quality release soon.  As a first 
estimate, we agreed to target July 15 for the release date, with the release 
managed as time-based rather than feature-based.  Carl will be reviewing the 
state of all features which have gone in since 0.5 and working with 
contributors to make sure that their functionality and development status (e.g. 
experimental vs stable) is documented as part of the release.

We propose to cut the 0.6 branch this weekend.  Changes which get committed to 
trunk by the end of this Friday (June 18) will automatically be included in 
0.6.  For anything committed afterwards which needs to be included in the 
release, it will be necessary to explicitly request that it be backported to 
0.6, and it will be the responsibility of the JIRA assignee (not the committer) 
to provide both the trunk patch as well as the backported patch (in cases where 
the trunk patch does not apply cleanly against the branch).

We did a brief triage over JIRA and selected a short list of issues as 
candidates to try to get committed into the 0.6 release (not necessarily before 
the branch is cut).  For other issues previously marked as Fix Version 0.6, 
I've changed the Fix Version to None.  The selection was ad hoc (most existing 
issues with available patches were included, plus some others people thought 
should be ready in time).  If you feel strongly about other issues, feel free 
to raise them as additional candidates; attention will be given to those with 
up-to-date patches available.  But do keep in mind that we're shooting for a 
time-based release, so if a particular issue doesn't make the cut, get it on 
the 0.7.0 train instead and start working towards that.  The overall benefit of 
having a more frequent and regular release series should be a big win for 
everyone.

Here's the latest list from JIRA:

http://tinyurl.com/2f9k8ff

Note that to avoid spamming, I disabled JIRA email updates while bulk-updating 
existing issues.

Stay tuned for more from Carl as the release date approaches!

JVS

[jira] Updated: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-802:


Affects Version/s: 0.5.0

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-802:


Status: Open  (was: Patch Available)

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1359) Unit test should be shim-aware

2010-06-15 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1359:
-

Fix Version/s: 0.6.0

> Unit test should be shim-aware
> --
>
> Key: HIVE-1359
> URL: https://issues.apache.org/jira/browse/HIVE-1359
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: unit_tests.txt
>
>
> Some features in Hive only works for certain Hadoop versions through shim. 
> However the unit test structure is not shim-aware in that there is only one 
> set of queries and expected outputs for all Hadoop versions. This may not be 
> sufficient when we will have different output for different Hadoop versions. 
> One example is CombineHiveInputFormat wich is only available from Hadoop 
> 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
> different. Another example is archival partitions (HAR) which is also only 
> available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-06-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879233#action_12879233
 ] 

Edward Capriolo commented on HIVE-1096:
---

This is my next target after the xdoc ticket. I should not be to far off.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-543) provide option to run hive in local mode

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-543:


Affects Version/s: 0.5.0
  Component/s: Query Processor

> provide option to run hive in local mode
> 
>
> Key: HIVE-543
> URL: https://issues.apache.org/jira/browse/HIVE-543
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Fix For: 0.6.0
>
> Attachments: hive-534.patch.2, hive-543.patch.1
>
>
> this is a little bit more than just mapred.job.tracker=local
> when run in this mode - multiple jobs are an issue since writing to same tmp 
> directories is an issue. the following options:
> hadoop.tmp.dir
> mapred.local.dir
> need to be randomized (perhaps based on queryid). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1095) Hive in Maven

2010-06-15 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1095:
-

Fix Version/s: (was: 0.6.0)

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-15 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1255:
--

Attachment: hive-1255-patch-4.txt

show_functions.q is patched

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-15 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1255:
--

Status: Patch Available  (was: Open)

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879218#action_12879218
 ] 

Paul Yang commented on HIVE-1411:
-

Taking a look.

> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.4.0, 0.4.1, 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1411.patch.txt
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpath. The URL 
> "file:/Users/hadop/hadoop-0.20.1+152/build/ivy/lib/Hadoo 
> p/common/core-3.1.1.jar" is already registered, and you are trying to 
> register an identical plugin located at URL "file:/Users/hado 
> p/hadoop-0.20.1+152/lib/core-3.1.1.jar." 
> at

[jira] Updated: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-15 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1255:


Status: Open  (was: Patch Available)

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-15 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879216#action_12879216
 ] 

Paul Yang commented on HIVE-1255:
-

Can you update show_functions.q? That test will fail with the addition of new 
UDF's.

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-15 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879212#action_12879212
 ] 

Paul Yang commented on HIVE-1255:
-

Taking a look.

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-15 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879208#action_12879208
 ] 

Paul Yang commented on HIVE-1176:
-

Can you elaborate on what you mean by 'some collections were being fetched as 
semi-populated proxies with missing session context leading to NPEs'? Is there 
something I can do to reproduce this?

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176-1.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1007) CombinedHiveInputFormat fails with empty input

2010-06-15 Thread Dave Lerman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879206#action_12879206
 ] 

Dave Lerman commented on HIVE-1007:
---

Tried the steps above on Hadoop 0.20.2.  It still logs "number of splits 0", 
but instead of hanging at 0%, it completes the map, then fails with a "Shuffle 
Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out." in the reduce.  This 
is clearly preferable to hanging and doesn't seem like a totally unreasonable 
result when asking for data from an empty file -- should I close the ticket?

> CombinedHiveInputFormat fails with empty input
> --
>
> Key: HIVE-1007
> URL: https://issues.apache.org/jira/browse/HIVE-1007
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.1
>Reporter: Dave Lerman
>Assignee: Dave Lerman
> Attachments: hive.1007.1.patch
>
>
> In a multi-stage query, when one stage returns no data (resulting in a bunch 
> of output files with size 0), the next stage creates a job with 0 mappers 
> which just sits in the Hadoop task track forever and hangs the query at 0%.  
> The issue is that CombineHiveInputFormat looks for blocks to populate splits, 
> find nones (since input is all 0 bytes), and then returns an empty array from 
> getSplits.
> There may be good a way to just skip that job altogether, but as a quick hack 
> to get it working, when there are no splits, I just create a single empty one 
> using the first path so that the job doesn't hang.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1411:
-

Status: Patch Available  (was: Open)

This patch adds the property "datanucleus.plugin.pluginRegistryBundleCheck" to 
hive-site.xml and sets the value to LOG. When not set this property defaults to 
EXCEPTION, which results in an exception being thrown if the CLASSPATH contains 
two or more JARs with the same name.


> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0, 0.4.1, 0.4.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1411.patch.txt
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpa

[jira] Commented: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879198#action_12879198
 ] 

Carl Steinbach commented on HIVE-1411:
--

A little background on what DataNucleus is trying to do: 
http://www.datanucleus.org/extensions/plugins.html

Definition of pluginRegistryBundleCheck property: 
http://www.datanucleus.org/products/accessplatform_1_0/persistence_properties.html#general



> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.4.0, 0.4.1, 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1411.patch.txt
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpath. The URL

[jira] Updated: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1411:
-

Attachment: HIVE-1411.patch.txt

> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.4.0, 0.4.1, 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1411.patch.txt
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpath. The URL 
> "file:/Users/hadop/hadoop-0.20.1+152/build/ivy/lib/Hadoo 
> p/common/core-3.1.1.jar" is already registered, and you are trying to 
> register an identical plugin located at URL "file:/Users/hado 
> p/hadoop-0.20.1+152/lib/core-3.1.1.jar." 
> at 
> org.datanucleus.plugin.

Release Management for 0.6.0

2010-06-15 Thread Ashish Thusoo

Hi Folks,

In order to be able to release Hive more often, we are going to have a formal 
release manager for upcoming releases. Our goal would be to release Hive more 
often than once in 6 months so that we can get more and more features pushed 
out with a greater frequency to a stable and well tested release. Carl 
Steinbach from Cloudera has graciously agreed to help us do this and he will be 
triaging and marking JIRAs that are an absolute must for 0.6.0. He will also be 
co-ordinating a bunch of tasks that are needed to make good release. Thanks 
again Carl for volunteering to do this. We really appreciate this and 
hopefully, with this system we will be able to release Hive more frequently to 
our users.

Thanks,
Ashish

[jira] Assigned: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-1411:


Assignee: Carl Steinbach

> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.4.0, 0.4.1, 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpath. The URL 
> "file:/Users/hadop/hadoop-0.20.1+152/build/ivy/lib/Hadoo 
> p/common/core-3.1.1.jar" is already registered, and you are trying to 
> register an identical plugin located at URL "file:/Users/hado 
> p/hadoop-0.20.1+152/lib/core-3.1.1.jar." 
> at 
> org.datanucleus.plugin.NonManagedPluginRegistry.registerBundle(NonMan

[jira] Created: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-06-15 Thread Carl Steinbach (JIRA)

DataNucleus barfs if JAR appears more than once in CLASSPATH


 Key: HIVE-1411
 URL: https://issues.apache.org/jira/browse/HIVE-1411
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.4.1, 0.4.0
Reporter: Carl Steinbach
 Fix For: 0.6.0


DataNucleus barfs when more than one JAR with the same name appears on the 
CLASSPATH:

{code}
2010-03-06 12:33:25,565 ERROR exec.DDLTask (SessionState.java:printError(279)) 
- FAILED: Error in metadata: javax.jdo.JDOFatalInter 
nalException: Unexpected exception caught. 
NestedThrowables: 
java.lang.reflect.InvocationTargetException 
org.apache.hadoop.hive.ql.metadata.HiveException: 
javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
NestedThrowables: 
java.lang.reflect.InvocationTargetException 
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
NestedThrowables: 
java.lang.reflect.InvocationTargetException 
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
 
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
 
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
 
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
... 12 more 
Caused by: java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) 
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
... 28 more 
Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
"org.eclipse.jdt.core" is already registered. Ensure you do 
nt have multiple JAR versions of the same plugin in the classpath. The URL 
"file:/Users/hadop/hadoop-0.20.1+152/build/ivy/lib/Hadoo 
p/common/core-3.1.1.jar" is already registered, and you are trying to register 
an identical plugin located at URL "file:/Users/hado 
p/hadoop-0.20.1+152/lib/core-3.1.1.jar." 
at 
org.datanucleus.plugin.NonManagedPluginRegistry.registerBundle(NonManagedPluginRegistry.java:437)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.registerBundle(NonManagedPluginRegistry.java:343)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.registerExtensions(NonManagedPluginRegistry.java:227)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.registerExtensionPoints(NonManagedPluginRegistry.java:159)
at 
org.datanucleus.plugin.PluginManager.registerExtensionPoints(PluginManager.java:82)
 
at org.datanucleu

[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1135:
-

Summary: Move hive language manual and tutorial to version control  (was: 
Move hive language manual and all wiki based documentation to forest)

> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
> jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879177#action_12879177
 ] 

John Sichi commented on HIVE-1397:
--

+1.  Will commit if tests pass.


> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch, 
> HIVE-1397.2.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1135:
-

Status: Open  (was: Patch Available)

> Move hive language manual and all wiki based documentation to forest
> 
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
> jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-06-15 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879173#action_12879173
 ] 

Carl Steinbach commented on HIVE-1096:
--

@Ed: Any update on this? I'd like to help out if you are low on cycles.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-06-15 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879171#action_12879171
 ] 

Carl Steinbach commented on HIVE-1096:
--

Parameters Substitution in Pig: http://wiki.apache.org/pig/ParameterSubstitution


> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1096) Hive Variables

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1096:
-

Fix Version/s: 0.6.0
Affects Version/s: 0.5.0
  Component/s: Query Processor

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1079) CREATE VIEW followup: derive dependencies on underlying base table partitions from view definition

2010-06-15 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879169#action_12879169
 ] 

John Sichi commented on HIVE-1079:
--

For the manual approach, a very simple thing we can start with is just letting 
users add partitions to views explicitly as a way of indicating that the 
underlying data is ready.

> CREATE VIEW followup:  derive dependencies on underlying base table 
> partitions from view definition
> ---
>
> Key: HIVE-1079
> URL: https://issues.apache.org/jira/browse/HIVE-1079
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
>
> When querying a view, it would be useful to know which underlying base table 
> partitions it depends on in order to know how fresh the result is (or to be 
> able to wait until all of those partitions have been loaded consistently).  
> The task is to come up with a way to perform this analysis automatically 
> (possibly overconservatively), or alternately to let the view creator 
> annotate the view definition with this dependency information, or some 
> combination of the two.
> Note that this would be useful for any complex query which directly accesses 
> base tables (not just view definitions).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1171) Check Hadoop JAR shim dependencies into lib/

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1171.
--

Resolution: Won't Fix

> Check Hadoop JAR shim dependencies into lib/
> 
>
> Key: HIVE-1171
> URL: https://issues.apache.org/jira/browse/HIVE-1171
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Carl Steinbach
>
> In order to satisfy the shim dependencies we currently have
> Ivy configured to download four different versions, or 162Mb worth
> of Hadoop source tarballs from archive.apache.org. This includes
> a lot of junk that we don't actually need.
> We should instead pair this down to what we do need and check it
> into the Hive source tree (no one has complained about problems
> syncing the svn repository, and we already have a bunch of JARs
> checked into lib/).
> Once Hadoop POMs become available we should shift back to Ivy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-847) support show databases

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-847:


 Assignee: Carl Steinbach  (was: He Yongqiang)
Fix Version/s: 0.6.0
Affects Version/s: 0.5.0
  Component/s: Metastore
   Query Processor

> support show databases
> --
>
> Key: HIVE-847
> URL: https://issues.apache.org/jira/browse/HIVE-847
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.5.0
>Reporter: Namit Jain
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-675) add database/scheme support Hive QL

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-675:


Fix Version/s: 0.6.0
Affects Version/s: 0.5.0
  Component/s: Metastore

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.5.0
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1192) Build fails when hadoop.version=0.20.1

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1192.
--

Resolution: Invalid

> Build fails when hadoop.version=0.20.1
> --
>
> Key: HIVE-1192
> URL: https://issues.apache.org/jira/browse/HIVE-1192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
> Attachments: hadoop-0.20.1.tar.gz.md5
>
>
> Setting hadoop.version=0.20.1 causes the build to fail since
> mirror.facebook.net/facebook/hive-deps does not have 0.20.1
> (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
> Suggested fix:
> * remove/ignore the hadoop.version configuration parameter
> or
> * Remove the patch numbers from these archives and use only the major.minor 
> numbers specified by the user to locate the appropriate tarball to download, 
> so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
> * Optionally create new tarballs that only contain the components that are 
> actually needed for the build (Hadoop jars), and remove things that aren't 
> needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Hive-Hbase integration problem, ask for help

2010-06-15 Thread Basab Maulik

I was not able to reproduce this problem on trunk (can't remember the
label). The funny thing was both the create table and the insert overwrite
worked even though the create table contained the invalid row format spec.

Basab

On Fri, Jun 11, 2010 at 1:33 PM, John Sichi  wrote:

> You should not be specifying any ROW FORMAT for an HBase table.
>
> From the log in your earlier post, I couldn't tell what was going wrong; I
> don' think it contained the full exception stacks.  You might be able to dig
> around in the actual log files to find more.
>
> JVS
> 
> From: Zhou Shuaifeng [zhoushuaif...@huawei.com]
> Sent: Thursday, June 10, 2010 7:26 PM
> To: hive-dev@hadoop.apache.org
> Cc: 'zhaozhifeng 00129982'
> Subject: Hive-Hbase integration problem, ask for help
>
> Hi Guys,
>
> I download the hive source from SVN server, build it and try to run the
> hive-hbase integration.
>
> I works well on all file-based hive tables, but on the hbase-based tables,
> the 'insert' command cann't run successful. The 'select' command can run
> well.
>
> error info is below:
>
> hive> INSERT OVERWRITE TABLE hive_zsf SELECT * FROM zsf WHERE id=3;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201006081948_0021, Tracking URL =
> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0021
> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0021
> 2010-06-09 16:05:43,898 Stage-0 map = 0%,  reduce = 0%
> 2010-06-09 16:06:12,131 Stage-0 map = 100%,  reduce = 100%
> Ended Job = job_201006081948_0021 with errors
>
> Task with the most failures(4):
> -
> Task ID:
>  task_201006081948_0021_m_00
>
> URL:
>  http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021
> <
> http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021&tipid=tas
> k_201006081948_0021_m_00>
> &tipid=task_201006081948_0021_m_00
> -
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
>
> I create a hbase-based table with hive, put some data into the hbase table
> through the hbase shell, and can select data from it through hive:
>
> CREATE TABLE hive_zsf1(id int, name string) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "hive_zsf1");
>
> hbase(main):001:0> scan 'hive_zsf1'
> ROW  COLUMN+CELL
>
>  1   column=cf1:val, timestamp=1276157509028,
> value=zsf
>  2   column=cf1:val, timestamp=1276157539051,
> value=zzf
>  3   column=cf1:val, timestamp=1276157548247,
> value=zw
>  4   column=cf1:val, timestamp=1276157557115,
> value=cjl
> 4 row(s) in 0.0470 seconds
> hbase(main):002:0>
>
> hive> select * from hive_zsf1 where id=3;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201006081948_0038, Tracking URL =
> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0038
> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0038
> 2010-06-11 10:25:42,049 Stage-1 map = 0%,  reduce = 0%
> 2010-06-11 10:25:45,090 Stage-1 map = 100%,  reduce = 0%
> 2010-06-11 10:25:48,133 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201006081948_0038
> OK
> 3   zw
> Time taken: 13.526 seconds
> hive>
>
>
>
>
>
> 
> -
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
>
>

[jira] Updated: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1364:
-

Fix Version/s: 0.6.0

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1192) Build fails when hadoop.version=0.20.1

2010-06-15 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879161#action_12879161
 ] 

John Sichi commented on HIVE-1192:
--

I think we can close this one now?


> Build fails when hadoop.version=0.20.1
> --
>
> Key: HIVE-1192
> URL: https://issues.apache.org/jira/browse/HIVE-1192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
> Attachments: hadoop-0.20.1.tar.gz.md5
>
>
> Setting hadoop.version=0.20.1 causes the build to fail since
> mirror.facebook.net/facebook/hive-deps does not have 0.20.1
> (only 0.17.2.1, 0.18.3, 0.19.0, 0.20.0).
> Suggested fix:
> * remove/ignore the hadoop.version configuration parameter
> or
> * Remove the patch numbers from these archives and use only the major.minor 
> numbers specified by the user to locate the appropriate tarball to download, 
> so 0.20.0 and 0.20.1 would both map to hadoop-0.20.tar.gz.
> * Optionally create new tarballs that only contain the components that are 
> actually needed for the build (Hadoop jars), and remove things that aren't 
> needed (all of the source files).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1179) Add UDF array_contains

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879149#action_12879149
 ] 

Namit Jain commented on HIVE-1179:
--

The patch applies cleanly - will commit if the tests pass

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179-3.patch, 
> HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-287:


Status: Open  (was: Patch Available)

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879147#action_12879147
 ] 

Namit Jain commented on HIVE-287:
-

Arvind, I am sorry about the delay in getting back to this.
Can you regenerate the patch - it is not applying cleanly. 
I will look at it right away

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1213) sort-merge join does not work for sub-queries

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1213:
-

   Status: Open  (was: Patch Available)
Fix Version/s: (was: 0.6.0)

> sort-merge join does not work for sub-queries
> -
>
> Key: HIVE-1213
> URL: https://issues.apache.org/jira/browse/HIVE-1213
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Attachments: Hive-1213.1.patch, Hive-1213.4.patch, Hive-1213.5.patch, 
> Hive-1213.6.patch, Hive-1213.7.patch, Hive-1213.8.patch, Hive-1213.9.patch
>
>
> A query like:
> select count(1) from (select /*+ MAPJOIN(x) */ from x join y ON ... ) subq;
> does not work - since there is no mapping between the join operator and the 
> corresponding source

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1213) sort-merge join does not work for sub-queries

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879145#action_12879145
 ] 

Namit Jain commented on HIVE-1213:
--

The patch does not apply cleanly

> sort-merge join does not work for sub-queries
> -
>
> Key: HIVE-1213
> URL: https://issues.apache.org/jira/browse/HIVE-1213
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Attachments: Hive-1213.1.patch, Hive-1213.4.patch, Hive-1213.5.patch, 
> Hive-1213.6.patch, Hive-1213.7.patch, Hive-1213.8.patch, Hive-1213.9.patch
>
>
> A query like:
> select count(1) from (select /*+ MAPJOIN(x) */ from x join y ON ... ) subq;
> does not work - since there is no mapping between the join operator and the 
> corresponding source

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-951:


Status: Open  (was: Patch Available)

> Selectively include EXTERNAL TABLE source files via REGEX
> -
>
> Key: HIVE-951
> URL: https://issues.apache.org/jira/browse/HIVE-951
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-951.patch
>
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
> expression. 
> CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
> outside of Hive, and
> currently makes the assumption that all of the files located under the 
> supplied path should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's 
> often
> impractical or impossible to adjust the layout of the directory to meet the 
> requirements of 
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
> table based
> on the contents of an S3 bucket. 
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename 
> matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE 
> LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009.bz2' located 
> under hdfs://data/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879143#action_12879143
 ] 

Namit Jain commented on HIVE-951:
-

Different tables which are linked to each other implicitly here and not known 
to the metastore.

A house-cleaning operation like :  archive partition or something else, can 
invalidate this new table.
Unless we have this, this seems risky

> Selectively include EXTERNAL TABLE source files via REGEX
> -
>
> Key: HIVE-951
> URL: https://issues.apache.org/jira/browse/HIVE-951
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-951.patch
>
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
> expression. 
> CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
> outside of Hive, and
> currently makes the assumption that all of the files located under the 
> supplied path should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's 
> often
> impractical or impossible to adjust the layout of the directory to meet the 
> requirements of 
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
> table based
> on the contents of an S3 bucket. 
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename 
> matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE 
> LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009.bz2' located 
> under hdfs://data/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1266) partitioning pruing should be more intelligent

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1266:
-

Status: Open  (was: Patch Available)

> partitioning pruing should be more intelligent
> --
>
> Key: HIVE-1266
> URL: https://issues.apache.org/jira/browse/HIVE-1266
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Attachments: hive-1266.1.patch
>
>
> Copying the mail from Adam:
> A badly written query:
> select .. from T where partitioning_colmn = 'p1' AND c1 = 100 or c2 = 200
> ...is a command I just foolishly ran: I should have put the disjunction in 
> parentheses.
> But the command actually touched every partition of T without a warning. Is 
> that a bug? When we force people to state a partition predicate, it seems we 
> are just looking for it to be referenced in the WHERE or ON clause, when 
> maybe we should be looking at conjuncts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1266) partitioning pruing should be more intelligent

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879141#action_12879141
 ] 

Namit Jain commented on HIVE-1266:
--

The patch does not apply cleanly.

> partitioning pruing should be more intelligent
> --
>
> Key: HIVE-1266
> URL: https://issues.apache.org/jira/browse/HIVE-1266
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Attachments: hive-1266.1.patch
>
>
> Copying the mail from Adam:
> A badly written query:
> select .. from T where partitioning_colmn = 'p1' AND c1 = 100 or c2 = 200
> ...is a command I just foolishly ran: I should have put the disjunction in 
> parentheses.
> But the command actually touched every partition of T without a warning. Is 
> that a bug? When we force people to state a partition predicate, it seems we 
> are just looking for it to be referenced in the WHERE or ON clause, when 
> maybe we should be looking at conjuncts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1007) CombinedHiveInputFormat fails with empty input

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1007:
-

Status: Open  (was: Patch Available)

> CombinedHiveInputFormat fails with empty input
> --
>
> Key: HIVE-1007
> URL: https://issues.apache.org/jira/browse/HIVE-1007
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.1
>Reporter: Dave Lerman
>Assignee: Dave Lerman
> Attachments: hive.1007.1.patch
>
>
> In a multi-stage query, when one stage returns no data (resulting in a bunch 
> of output files with size 0), the next stage creates a job with 0 mappers 
> which just sits in the Hadoop task track forever and hangs the query at 0%.  
> The issue is that CombineHiveInputFormat looks for blocks to populate splits, 
> find nones (since input is all 0 bytes), and then returns an empty array from 
> getSplits.
> There may be good a way to just skip that job altogether, but as a quick hack 
> to get it working, when there are no splits, I just create a single empty one 
> using the first path so that the job doesn't hang.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1007) CombinedHiveInputFormat fails with empty input

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879140#action_12879140
 ] 

Namit Jain commented on HIVE-1007:
--

Is this still a problem - I think this problem has been fixed in open source 
some time back.
If yes, can you close the jira ?

> CombinedHiveInputFormat fails with empty input
> --
>
> Key: HIVE-1007
> URL: https://issues.apache.org/jira/browse/HIVE-1007
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.1
>Reporter: Dave Lerman
>Assignee: Dave Lerman
> Attachments: hive.1007.1.patch
>
>
> In a multi-stage query, when one stage returns no data (resulting in a bunch 
> of output files with size 0), the next stage creates a job with 0 mappers 
> which just sits in the Hadoop task track forever and hangs the query at 0%.  
> The issue is that CombineHiveInputFormat looks for blocks to populate splits, 
> find nones (since input is all 0 bytes), and then returns an empty array from 
> getSplits.
> There may be good a way to just skip that job altogether, but as a quick hack 
> to get it working, when there are no splits, I just create a single empty one 
> using the first path so that the job doesn't hang.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1397:


Attachment: HIVE-1397.2.patch

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch, 
> HIVE-1397.2.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1409:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.6.0
   Resolution: Fixed

Committed. Thanks Paul

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1409.1.patch, HIVE-1409.2.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1397:


Status: Patch Available  (was: Open)

Implemented changes discussed with JVS.

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch, 
> HIVE-1397.2.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1410) Add TCP keepalive option for the metastore server

2010-06-15 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879114#action_12879114
 ] 

Ning Zhang commented on HIVE-1410:
--

+1. Will test and commit.

> Add TCP keepalive option for the metastore server
> -
>
> Key: HIVE-1410
> URL: https://issues.apache.org/jira/browse/HIVE-1410
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1410.1.patch
>
>
> In production, we have noticed that the metastore server tends to accumulate 
> half-open TCP connections when it has been running for a long time. By 
> half-open, I am referring to idle connections where there is an established 
> TCP connection on the metastore server machine, but no corresponding 
> connection on the client machine. This could be the result of network 
> disconnects or crashed clients.
> This patch will add an option to turn on TCP keepalive so that these 
> half-open connections will get cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879094#action_12879094
 ] 

John Sichi commented on HIVE-1397:
--

histogram_numeric is fine.

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879093#action_12879093
 ] 

John Sichi commented on HIVE-1364:
--

Couldn't we just use a LOB?  So far so good with views (except for the Oracle 
mapping which we still need to address to get it to use CLOB instead of LONG 
VARCHAR).

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879092#action_12879092
 ] 

Prasad Chakka commented on HIVE-1364:
-

it used to be much higher in the beginning but quite a few users reported 
problems on some mysql dbs. 767 seemed to work most dbs. before committing this 
can someone test this on some different dbs (with and without UTF encoding)?

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-1364:


Assignee: Carl Steinbach

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1364:
-

Attachment: HIVE-1364.patch

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879081#action_12879081
 ] 

Mayank Lahiri commented on HIVE-1397:
-

Does 'histogram_numeric()' work? And I'm patching it to return a list of
(x,y) structs. 







> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879077#action_12879077
 ] 

HBase Review Board commented on HIVE-1397:
--

Message from: "John Sichi" 

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java,
 line 346
bq.  > 
bq.  >
bq.  > Since eventually we would like to support histograms on non-numeric 
fields such as STRING, I think we should rename this one numeric_histogram 
(likewise for the Java class) to avoid confusion later when we have other 
algorithms.
bq.  >
bq.  
bq.  Mayank Lahiri wrote:
bq.  It might seem a little odd since histograms are generally used as 
approximations of numerical distributions. I would suggest either (a) 
overloading histogram() to behave differently on STRING arguments (perhaps 
STRING arguments that cause a NumberFormatException), or (b) creating a 
factor_histogram() function for general strings. 
bq.  
bq.  I could add in the code for computing frequencies of STRINGs quite 
easily, although there's no way to prevent it from choking if there are too 
many unique strings.

I don't think we should handle strings now, but we should rename this one to 
make it clear that it only works on numeric data.  And per our discussion 
offline, reject attempts to use it on non-numeric data.

- John

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/181/#review224
---

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-15 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879061#action_12879061
 ] 

Ning Zhang commented on HIVE-1139:
--

I'm not aware of an efficient serde that are reflection based. The 
XMLEncoder/Decoder is JAXB-based and are very inefficient. That's another 
reason we don't want to use it in the execution code path. A better way is to 
use the Hive SerDe (e.g., LazyBinarySerde) just as it is done in RowContainer.

Another way to tackle this problem is have more accurate estimate of how many 
rows can be fit into the main memory. The current code checks the amount of 
available memory and use 0.25 (by default) of them to hold the hashmap. We set 
it to 0.15 in our environment and it works for most cases. 0.15 is probably a 
little bit conservative. Some experiments need to be done to tune this 
parameter so that most cases will be fit into main memory and only for the 
exceptional cases the secondary storage will be used. 

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
> Attachments: PersistentMap.zip
>
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1397:
-

Status: Open  (was: Patch Available)

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-15 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879054#action_12879054
 ] 

HBase Review Board commented on HIVE-1397:
--

Message from: "Mayank Lahiri" 

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java,
 line 346
bq.  > 
bq.  >
bq.  > Since eventually we would like to support histograms on non-numeric 
fields such as STRING, I think we should rename this one numeric_histogram 
(likewise for the Java class) to avoid confusion later when we have other 
algorithms.
bq.  >

It might seem a little odd since histograms are generally used as 
approximations of numerical distributions. I would suggest either (a) 
overloading histogram() to behave differently on STRING arguments (perhaps 
STRING arguments that cause a NumberFormatException), or (b) creating a 
factor_histogram() function for general strings. 

I could add in the code for computing frequencies of STRINGs quite easily, 
although there's no way to prevent it from choking if there are too many unique 
strings.

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java,
 line 239
bq.  > 
bq.  >
bq.  > Shouldn't this method be private?

yes! thanks!

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java,
 line 186
bq.  > 
bq.  >
bq.  > Formatting nitpick:  Here and elsewhere, use braces even for 
single-line blocks; we use the NeedBraces checkstyle rule to flag this.
bq.  > 
bq.  > 
http://stackoverflow.com/questions/382633/can-the-checkstyle-module-needbraces-work-with-nested-if-else-blocks
bq.  >

Will re-submit patch, thanks.

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java,
 line 362
bq.  > 
bq.  >
bq.  > Under what conditions can this exception be encountered?  Shouldn't 
it be impossible since we already checked the type up front?

This could happen when a STRING row contains a non-numeric value. As a 
follow-up to my earlier comment, we could either drop the value or somehow 
"intelligently" switch to computing the histogram over strings instead of 
doubles.

bq.  On None, John Sichi wrote:
bq.  > 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java,
 line 153
bq.  > 
bq.  >
bq.  > Is it possible to return an ARRAY 
instead?  That seems more natural (and compact) than a MAP.
bq.  > 
bq.  > But if you already got feedback that MAP is prefereable, ignore this 
comment.
bq.  >

Not a problem. I was using the map to avoid an extra level of indirection, and 
possibly to make it compatible with an explode() extension that explodes maps 
as well as arrays. 

- Mayank

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/181/#review224
---

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message

[jira] Updated: (HIVE-1410) Add TCP keepalive option for the metastore server

2010-06-15 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1410:


Status: Patch Available  (was: Open)

> Add TCP keepalive option for the metastore server
> -
>
> Key: HIVE-1410
> URL: https://issues.apache.org/jira/browse/HIVE-1410
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1410.1.patch
>
>
> In production, we have noticed that the metastore server tends to accumulate 
> half-open TCP connections when it has been running for a long time. By 
> half-open, I am referring to idle connections where there is an established 
> TCP connection on the metastore server machine, but no corresponding 
> connection on the client machine. This could be the result of network 
> disconnects or crashed clients.
> This patch will add an option to turn on TCP keepalive so that these 
> half-open connections will get cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-15 Thread Soundararajan Velu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879051#action_12879051
 ] 

Soundararajan Velu commented on HIVE-1139:
--

right... Ning is there any open source serializers/deserializers that you are 
aware of that is reflections based, if so I can quickly implement a similar 
persistent map around that

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
> Attachments: PersistentMap.zip
>
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1410) Add TCP keepalive option for the metastore server

2010-06-15 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1410:


Attachment: HIVE-1410.1.patch

No unit test for this one, but I ran 2 metastore servers, one with and one 
without this patch to verify that the one with the patch did not accumulate 
half-open connections.

> Add TCP keepalive option for the metastore server
> -
>
> Key: HIVE-1410
> URL: https://issues.apache.org/jira/browse/HIVE-1410
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1410.1.patch
>
>
> In production, we have noticed that the metastore server tends to accumulate 
> half-open TCP connections when it has been running for a long time. By 
> half-open, I am referring to idle connections where there is an established 
> TCP connection on the metastore server machine, but no corresponding 
> connection on the client machine. This could be the result of network 
> disconnects or crashed clients.
> This patch will add an option to turn on TCP keepalive so that these 
> half-open connections will get cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1410) Add TCP keepalive option for the metastore server

2010-06-15 Thread Paul Yang (JIRA)

Add TCP keepalive option for the metastore server
-

 Key: HIVE-1410
 URL: https://issues.apache.org/jira/browse/HIVE-1410
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang


In production, we have noticed that the metastore server tends to accumulate 
half-open TCP connections when it has been running for a long time. By 
half-open, I am referring to idle connections where there is an established TCP 
connection on the metastore server machine, but no corresponding connection on 
the client machine. This could be the result of network disconnects or crashed 
clients.

This patch will add an option to turn on TCP keepalive so that these half-open 
connections will get cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879041#action_12879041
 ] 

Namit Jain commented on HIVE-1409:
--

+1

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1409.1.patch, HIVE-1409.2.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-543) provide option to run hive in local mode

2010-06-15 Thread Joydeep Sen Sarma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879039#action_12879039
 ] 

Joydeep Sen Sarma commented on HIVE-543:


#2 - this piece of code got quite messed up because of the changes because of 
parallel execution (hive-549). the initial synchronized block protected access 
to a single global variable. this was replaced by a synchronized map (gWorkMap) 
- but the surrounding synchronized block was never taken out (it's now 
unnecessary because of the synchronized map). also the (gWork==null) check 
inside the synchronized section was redundant (it made sense when there was a 
singleton pattern - but with the synchronized map doesn't make sense).

#1- don't understand this. there's already a hadoop parameter 
(mapred.local.dir) for specifying local scratch directory. i didn't add any new 
parameters as far as hive scratch directories are concerned .. is the concern 
about automatically selecting local intermediate directory for local mode 
execution? - that should be ok.

#3 - mystery to me as well. the ordering of the output lines has changed (not 
the content). the diff script is not able to ignore these changes.

> provide option to run hive in local mode
> 
>
> Key: HIVE-543
> URL: https://issues.apache.org/jira/browse/HIVE-543
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-534.patch.2, hive-543.patch.1
>
>
> this is a little bit more than just mapred.job.tracker=local
> when run in this mode - multiple jobs are an issue since writing to same tmp 
> directories is an issue. the following options:
> hadoop.tmp.dir
> mapred.local.dir
> need to be randomized (perhaps based on queryid). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1386) HiveQL SQL Compliance (Umbrella)

2010-06-15 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879037#action_12879037
 ] 

Ning Zhang commented on HIVE-1386:
--

I've updated the wiki to include "order by". Please check it out at 
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy

> HiveQL SQL Compliance (Umbrella)
> 
>
> Key: HIVE-1386
> URL: https://issues.apache.org/jira/browse/HIVE-1386
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>
> This is an umbrella ticket to track work related to HiveQL compliance with 
> the SQL standard, e.g. supported query syntax, data types, views, catalog 
> access, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1409:


Attachment: HIVE-1409.2.patch

* Drop table added

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1409.1.patch, HIVE-1409.2.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1409:


Status: Patch Available  (was: Open)

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1409.1.patch, HIVE-1409.2.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1403) Reporting progress to JT during closing files in FileSinkOperator

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1403:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.6.0
   Resolution: Fixed

Committed. Thanks Ning

> Reporting progress to JT during closing files in FileSinkOperator
> -
>
> Key: HIVE-1403
> URL: https://issues.apache.org/jira/browse/HIVE-1403
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1403.patch
>
>
> If there are too many files need to be closed in FileSinkOperator (e.g., if 
> DynamicPartition/FileSpray is turned on), there could be many files generated 
> by each task and they need to be closed at the FileSinkOperator.closeOp(). If 
> the NN is overloaded each file close could take more than 1 sec. This 
> sometimes make JT think the task is dead since it takes too long to close all 
> the files and without any progress report. We need to report progress after a 
> while during file closing. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-543) provide option to run hive in local mode

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879002#action_12879002
 ] 

Namit Jain commented on HIVE-543:
-

Some minor comments:

1. Do you want to add a parameter for specifying the local directory ?
2. Not sure about the removal of synchronization around gWorkMap - let us go 
over it offline.
3. Need to see in more detail - why did the output change for only 2 join tests

> provide option to run hive in local mode
> 
>
> Key: HIVE-543
> URL: https://issues.apache.org/jira/browse/HIVE-543
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-534.patch.2, hive-543.patch.1
>
>
> this is a little bit more than just mapred.job.tracker=local
> when run in this mode - multiple jobs are an issue since writing to same tmp 
> directories is an issue. the following options:
> hadoop.tmp.dir
> mapred.local.dir
> need to be randomized (perhaps based on queryid). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878992#action_12878992
 ] 

Namit Jain commented on HIVE-1409:
--

After more review, please drop the table at the end of the test that you added.
It can lead to non-deterministic results for other tests

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1409.1.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1409) File format information is retrieved from first partition

2010-06-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1409:
-

Status: Open  (was: Patch Available)

> File format information is retrieved from first partition
> -
>
> Key: HIVE-1409
> URL: https://issues.apache.org/jira/browse/HIVE-1409
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1409.1.patch
>
>
> Currently, if no partitions match the partition predicate, the first 
> partition is used to retrieve the file format. This can cause an problem if 
> the table is set to use RCFile, but the first partition uses SequenceFile:
> {code}
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.createKey(SequenceFileRecordReader.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:76)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createKey(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.shims.Hadoop20Shims$CombineFileRecordReader.createKey(Hadoop20Shims.java:212)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:167)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer.()
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getDeclaredConstructor(Class.java:1985)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
>   ... 9 more
> {code}
> The proposed change is to use the table's metadata in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-897) fix inconsistent expectations from table/partition location value

2010-06-15 Thread Soundararajan Velu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878940#action_12878940
 ] 

Soundararajan Velu commented on HIVE-897:
-

Prasad, do we have any details on this issue, We were looking at fixing this 
but the problem statement is very abstract.

> fix inconsistent expectations from table/partition location value
> -
>
> Key: HIVE-897
> URL: https://issues.apache.org/jira/browse/HIVE-897
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
>
> currently code expects this to be full URI in some locations 
> (LoadSemanticAnalyzer). Also HiveAlterHandle should work in either case. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Alter table for rename column is not working

2010-06-15 Thread Carl Steinbach

HI Jaydeep,

The ability to change column names first became available in Hive 0.5. This
feature is not present in Hive 0.4.1 or any earlier release. I tried your
example on Hive 0.4.1 and verified that it fails with the same error. It
works on Hive 0.5.

Thanks.

Carl

On Mon, Jun 14, 2010 at 11:28 PM, jaydeep vishwakarma <
jaydeep.vishwaka...@mkhoj.com> wrote:

> Hi,
>
> I just copied and pasted hive tutorial example. I dont know why column
> rename is working,
>
> I ran following statement:
>
> hive> CREATE TABLE test_change (a int, b int, c int);
> OK
> Time taken: 0.227 seconds
> hive> ALTER TABLE test_change CHANGE a a1 INT;
> FAILED: Parse Error: line 1:12 cannot recognize input 'test_change' in
> alter statement
>
>
> Regards,
> Jaydeep
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

83 matches

Mail list logo