[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2009-10-13 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765362#action_12765362
 ] 

Min Zhou commented on HIVE-842:
---

@Edward

Kerberos for authethication is a good way I think,  user/password is no need 
here.  This issue would be implemented in the future.
btw, we've finished the development of authorization infrastructure for Hive.  

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo

 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2009-09-21 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758112#action_12758112
 ] 

Min Zhou commented on HIVE-78:
--

@Namit

Got your meaning.  We are maintaining a version of our own, it needs couples of 
weeks for adapting  to the trunk.

 Authorization infrastructure for Hive
 -

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
 hive-78-syntax-v1.patch, hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2009-09-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757616#action_12757616
 ] 

Min Zhou commented on HIVE-78:
--

sorry, 
{nofromat}
public class GenericAuthenticator extends  Authenticator {
  public GenericAuthenticator (Hive db, User user);
   ...
}
{nofromat}

 Authorization infrastructure for Hive
 -

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2009-09-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757622#action_12757622
 ] 

Min Zhou commented on HIVE-78:
--

oops,  my code wasn't in my machine. I just pasted yours and modified it into 
mine. 
here is a patch show my code on that.


 Authorization infrastructure for Hive
 -

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
 hive-78-syntax-v1.patch, hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

2009-09-18 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-78:
-

Attachment: createuser-v1.patch

 Authorization infrastructure for Hive
 -

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
 hive-78-syntax-v1.patch, hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756904#action_12756904
 ] 

Min Zhou commented on HIVE-78:
--

Let me guess, you are all talking about CLI. But we are using HiveServer as a 
multi-user server, not just support only one user  like mysqld does.

 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756949#action_12756949
 ] 

Min Zhou commented on HIVE-78:
--

I do not think the HiveServer in your mind is the same as mine, which support 
multiple users, not only one.

 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756951#action_12756951
 ] 

Min Zhou commented on HIVE-78:
--

From the words you commented:
{noformat}
Daemons like HiveService and HiveWebInterface will have to run as supergroup or 
a hive group? 
{noformat}

 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

2009-09-16 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-78:
-

Attachment: hive-78-metadata-v1.patch

 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-16 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756335#action_12756335
 ] 

Min Zhou commented on HIVE-78:
--

@Edward

Sorry for my abuse of some words,  I hope this will not affect our work.  

Can you give me the jiras you decided not to store username/password 
information in hive and hadoop will?   
I think most companies are using hadoop versions from 0.17 to 0.20 , which 
don't have good password securities. Once  a company takes a particular 
version, upgrades for them is a very important issue, many companies will adopt 
a more stable version. Moreover, now hadoop still do not have that feature, 
which may cost a very long time to implement.  Why should we are waiting for, 
rather than accomplish it? I think Hive is necessary to support user/password 
at least for current versions of hadoop. There are many companies who are using 
hive reflected that current hive is inconvenient for multi-user, as long as 
environment isolation, table sharing, security, etc. We must try to meet the 
requirements of most of them.

Regarding the syntax, I guess we can do it in two steps. 
# support GRANT/REVOKE privileges to users.
# support some sort of server administration privileges as Ashish metioned. 
The GRANT statement enables system administrators to create Hive user accounts 
and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION 
privilege, and you must have the privileges that you are grantingad. The REVOKE 
statement is related and enables ministrators to remove account privileges.

 File hive-78-syntax-v1.patch modifies the syntax. Any comments on that?


 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-15 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755876#action_12755876
 ] 

Min Zhou commented on HIVE-78:
--

we will take over this issue, it would be finished in two weeks.  Here are the 
sql statements will be added:
{noformat}
CREATE USER, 
DROP USER;
ALTER USER SET PASSOWRD;
GRANT;
REVOKE
{noformat}

Metadata is stored at some sort of persistent media such as mysql DBMS through 
jdo.  We will add three tables for this issue, they are USER, DBS_PRIV, 
TABLES_PRIV. Privileges can be granted at several levels, each table above are 
corresponding to a privilege level. 
#  Global level
Global privileges apply to all databases on a given server. These privileges 
are stored in the USER table. GRANT ALL ON *.* and REVOKE ALL ON *.* grant and 
revoke only global privileges. 
GRANT ALL ON *.* TO 'someuser';
GRANT SELECT, INSERT ON *.* TO 'someuser';

#  Database level
Database privileges apply to all objects in a given database. These privileges 
are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON 
db_name.* grant and revoke only database privileges. 
GRANT ALL ON mydb.* TO 'someuser';
GRANT SELECT, INSERT ON mydb.* TO 'someuser';
Although we can't create DBs currently,  it would take a reserved place till 
hive support.

# Table level
Table privileges apply to all columns in a given table. These privileges are 
stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL 
ON db_name.tbl_name grant and revoke only table privileges. 
GRANT ALL ON mydb.mytbl TO 'someuser';
GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser';

Hive account information is stored in USER table, includes username, password 
and kinds of privileges. User who has been granted any privilege to, such as 
select/insert/drop on a particular table, always have a right to show that 
table.



 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

2009-09-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-78:
-

Attachment: hive-78-syntax-v1.patch

 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-syntax-v1.patch, hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-15 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755882#action_12755882
 ] 

Min Zhou commented on HIVE-78:
--

 We currently use seperated mysql dbs for achieving an isolated CLI 
environment, which is not practical. An authentication infrastructure is 
urgently needed for us.

Almost all statements would be influenced, for example
SELECT 
INSERT
SHOW TABLES
SHOW PARTITIONS
DESCRIBE TABLE
MSCK
CREATE TABLE
CREATE FUNCTION -- we are considering how to control people creating udfs.
DROP TABLE
DROP FUNCTION
LOAD
added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even 
includes some non-sql commands like set , add file ,add jar. 


 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-syntax-v1.patch, hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-818) Create a Hive CLI that connects to hive ThriftServer

2009-09-08 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752852#action_12752852
 ] 

Min Zhou commented on HIVE-818:
---

this feature looks pretty good for us, we were looking for a CLI mode client of 
hive server.

 Create a Hive CLI that connects to hive ThriftServer
 

 Key: HIVE-818
 URL: https://issues.apache.org/jira/browse/HIVE-818
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Clients, Server Infrastructure
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 We should have an alternate CLI that works by interacting with the 
 HiveServer, in this way it will be ready when/if we deprecate the current CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-814) exception alter a int typed column to date/datetime/timestamp

2009-09-03 Thread Min Zhou (JIRA)
exception alter a int typed column to date/datetime/timestamp
-

 Key: HIVE-814
 URL: https://issues.apache.org/jira/browse/HIVE-814
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Min Zhou


As fas as i know, time types can only be used in partitions,  normal columns is 
not allowed to be set as those types . But it's found can alter a no time types 
column to date/datetime/timestamp,but exceptions will be throwed when 
describing.

hive create table pokes(foo int, bar string);
OK
Time taken: 0.894 seconds
hive alter table pokes replace columns(foo date, bar string);
OK
Time taken: 0.266 seconds

hive describe pokes;  
FAILED: Error in metadata: 
MetaException(message:java.lang.IllegalArgumentException Error: type expected 
at the position 0 of 'date:string' but 'date' is found.)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-607) Create statistical UDFs.

2009-07-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736473#action_12736473
 ] 

Min Zhou commented on HIVE-607:
---

@Namit
I implemented group_cat() in a rush, and found something difficult slove: 
1. function group_cat() has a internal order by clause, currently, we can't 
such aggregation in hive. 
2. when the string will be group concated is too large, in another is appears 
data skew, there is ofen not  enough memory to store such a big string.

 Create statistical UDFs.
 

 Key: HIVE-607
 URL: https://issues.apache.org/jira/browse/HIVE-607
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Emil Ibrishimov
Priority: Minor
 Fix For: 0.4.0

 Attachments: HIVE-607.1.patch, UDAFStddev.java


 Create UDFs replicating:
 STD() Return the population standard deviation
 STDDEV_POP()(v5.0.3)  Return the population standard deviation
 STDDEV_SAMP()(v5.0.3) Return the sample standard deviation
 STDDEV()  Return the population standard deviation
 SUM() Return the sum
 VAR_POP()(v5.0.3) Return the population standard variance
 VAR_SAMP()(v5.0.3)Return the sample variance
 VARIANCE()(v4.1)  Return the population standard variance
 as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-607) Create statistical UDFs.

2009-07-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736475#action_12736475
 ] 

Min Zhou commented on HIVE-607:
---

sorry, some typo

@Namit
I've implemented group_cat() in a rush, and found something difficult to slove:
1. function group_cat() has a internal order by clause, currently, we can't 
implement such an aggregation in hive.
2. when the strings will be group concated are too large, in another words, if 
data skew appears,  there is ofen not enough memory to store such a big result.

 Create statistical UDFs.
 

 Key: HIVE-607
 URL: https://issues.apache.org/jira/browse/HIVE-607
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Emil Ibrishimov
Priority: Minor
 Fix For: 0.4.0

 Attachments: HIVE-607.1.patch, UDAFStddev.java


 Create UDFs replicating:
 STD() Return the population standard deviation
 STDDEV_POP()(v5.0.3)  Return the population standard deviation
 STDDEV_SAMP()(v5.0.3) Return the sample standard deviation
 STDDEV()  Return the population standard deviation
 SUM() Return the sum
 VAR_POP()(v5.0.3) Return the population standard variance
 VAR_SAMP()(v5.0.3)Return the sample variance
 VARIANCE()(v4.1)  Return the population standard variance
 as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions

2009-07-29 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-702:
--

Attachment: HIVE-702.1.patch

patch

 DROP TEMPORARY FUNCTION should not drop builtin functions
 -

 Key: HIVE-702
 URL: https://issues.apache.org/jira/browse/HIVE-702
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou
 Attachments: HIVE-702.1.patch


 Only temporary functions should be dropped. It should error out if the user 
 tries to drop built-in functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions

2009-07-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736762#action_12736762
 ] 

Min Zhou commented on HIVE-702:
---

pls wait a moment, I haven't deal with the conflict you mentioned.

 DROP TEMPORARY FUNCTION should not drop builtin functions
 -

 Key: HIVE-702
 URL: https://issues.apache.org/jira/browse/HIVE-702
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou
 Attachments: HIVE-702.1.patch


 Only temporary functions should be dropped. It should error out if the user 
 tries to drop built-in functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions

2009-07-29 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-702:
--

Attachment: HIVE-702.2.patch

done

 DROP TEMPORARY FUNCTION should not drop builtin functions
 -

 Key: HIVE-702
 URL: https://issues.apache.org/jira/browse/HIVE-702
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou
 Attachments: HIVE-702.1.patch, HIVE-702.2.patch


 Only temporary functions should be dropped. It should error out if the user 
 tries to drop built-in functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions

2009-07-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736767#action_12736767
 ] 

Min Zhou commented on HIVE-702:
---

that patch hasn't been tested, cuz I stay at home, can not connect to the 
company's vpn.

 DROP TEMPORARY FUNCTION should not drop builtin functions
 -

 Key: HIVE-702
 URL: https://issues.apache.org/jira/browse/HIVE-702
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou
 Attachments: HIVE-702.1.patch, HIVE-702.2.patch


 Only temporary functions should be dropped. It should error out if the user 
 tries to drop built-in functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-700) Fix test error by adding DROP FUNCTION

2009-07-28 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-700:
-

Assignee: Min Zhou

 Fix test error by adding DROP FUNCTION
 

 Key: HIVE-700
 URL: https://issues.apache.org/jira/browse/HIVE-700
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Min Zhou

 Since we added Show Functions in HIVE-580, test results will depend on what 
 temporary functions are added to the system.
 We should add the capability of DROP FUNCTION, and do that at the end of 
 those create function tests to make sure the show functions results are 
 deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-649) [UDF] now() for getting current time

2009-07-28 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-649:
--

Attachment: HIVE-649.patch

patch

 [UDF] now() for getting current time
 

 Key: HIVE-649
 URL: https://issues.apache.org/jira/browse/HIVE-649
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Min Zhou
 Attachments: HIVE-649.patch


 http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-700) Fix test error by adding DROP FUNCTION

2009-07-28 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-700:
--

Attachment: HIVE-700.1.patch

usage: 
drop function function_name

 Fix test error by adding DROP FUNCTION
 

 Key: HIVE-700
 URL: https://issues.apache.org/jira/browse/HIVE-700
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-700.1.patch


 Since we added Show Functions in HIVE-580, test results will depend on what 
 temporary functions are added to the system.
 We should add the capability of DROP FUNCTION, and do that at the end of 
 those create function tests to make sure the show functions results are 
 deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-700) Fix test error by adding DROP FUNCTION

2009-07-28 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736435#action_12736435
 ] 

Min Zhou commented on HIVE-700:
---

Sorry for my late. we have a training today, I will update a new patch for 
hive-700 related jiras.

 Fix test error by adding DROP FUNCTION
 

 Key: HIVE-700
 URL: https://issues.apache.org/jira/browse/HIVE-700
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-700.1.patch, hive.700.2.patch


 Since we added Show Functions in HIVE-580, test results will depend on what 
 temporary functions are added to the system.
 We should add the capability of DROP FUNCTION, and do that at the end of 
 those create function tests to make sure the show functions results are 
 deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions

2009-07-28 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-702:
-

Assignee: Min Zhou

 DROP TEMPORARY FUNCTION should not drop builtin functions
 -

 Key: HIVE-702
 URL: https://issues.apache.org/jira/browse/HIVE-702
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou

 Only temporary functions should be dropped. It should error out if the user 
 tries to drop built-in functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-642) udf equivalent to string split

2009-07-21 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733641#action_12733641
 ] 

Min Zhou commented on HIVE-642:
---

It's very useful for us . 
some comments:
# Can you implement it directly with Text ? Avoiding string  decoding and 
encoding would be faster.  Of course that trick may lead to another problem, as 
String.split uses a regular expression for splitting.
# getDisplayString() always return a  string in lowercase. 

 udf equivalent to string split
 --

 Key: HIVE-642
 URL: https://issues.apache.org/jira/browse/HIVE-642
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Emil Ibrishimov
 Fix For: 0.4.0

 Attachments: HIVE-642.1.patch, HIVE-642.2.patch


 It would be very useful to have a function equivalent to string split in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-599) Embedded Hive SQL into Python

2009-07-20 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733429#action_12733429
 ] 

Min Zhou commented on HIVE-599:
---

I agree with Namit and Yongqiang.   I was thinking about creating function with 
a format like below:
{noformat}
create function function_name (arguments list ) as python {
python udf code
} 

create function function_name (arguments list ) as java{
java udf code
} 
{noformat}
we can dynamiclly compile those kinds of code above, use jython  
com.sun.tools.javac respectively.

It's better store python  or  java udf byte code into the persistent metastore 
typically mysql after creation.  We can call that function again w/o a second 
function creation.


 Embedded Hive SQL into Python
 -

 Key: HIVE-599
 URL: https://issues.apache.org/jira/browse/HIVE-599
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo

 While Hive does SQL it would be very powerful to be able to embed that SQL in 
 languages like python in such a way that the hive query is also able to 
 invoke python functions seemlessly. One possibility is to explore integration 
 with Dumbo. Another is to see if the internal map_reduce.py tool can be open 
 sourced as a Hive contrib.
 Other thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-19 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733015#action_12733015
 ] 

Min Zhou commented on HIVE-512:
---

can you answer me about this queries?


 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.2.patch, HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-19 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733016#action_12733016
 ] 

Min Zhou commented on HIVE-512:
---

select(1, '2', 3)
select(2, '2', 3)
select(1, true, 3)
select(2, 2.0, cast(3 as double))

if we don't uniformly return strings, it would be confused to user detemining 
which type will return.

 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.2.patch, HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-19 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733103#action_12733103
 ] 

Min Zhou commented on HIVE-512:
---

If you inspected the implementation of case, you will know it's unacceptable to 
case with different types of arguments.
see: GenericUDFCase.java , GenericUDFWhen.java 
{code}
hive select case when true then '2' else 3 end from pokes limit 1;
FAILED: Error in semantic analysis: line 1:36 Argument Type Mismatch 3: The 
expression after ELSE should have the same type as those after THEN: string 
is expected but int is found
{code}

elt is a string function, confusion will be caused if we casually change its 
behavior. It no need make things more complex.

 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.2.patch, HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732911#action_12732911
 ] 

Min Zhou commented on HIVE-512:
---

Here is the definition of elt:  Return string at index number. 
It's essentially a string function
select elt(1, 2, 3) will return a varbinary in mysql, rather than a int. I 
still insist returning string is better. 

Even if do it as you said,  what  type of results will  return when doing 
queries like below?

select(1, '2', 3)
select(2, '2', 3)
select(1, true, 3)
select(2, 2.0, cast(3 as double))



 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.2.patch, HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732828#action_12732828
 ] 

Min Zhou commented on HIVE-512:
---

actually, elt return only two types of results in mysql : varbinary, varchar. 
varchar will be returned if all arguments are varchars, or varbinarys will be 
returned. 

mysql create table t3 as select elt(1, 'a',  3);
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql describe t3;   
+-+--+--+-+-+---+
| Field   | Type | Null | Key | Default | Extra |
+-+--+--+-+-+---+
| elt(1, 'a',  3) | varbinary(1) | YES  | | NULL|   |
+-+--+--+-+-+---+
1 row in set (0.00 sec)

mysql create table t4 as select elt(1, true,  false); 
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql describe t4;  
+--+--+--+-+-+---+
| Field| Type | Null | Key | Default | Extra |
+--+--+--+-+-+---+
| elt(1, true,  false) | varbinary(1) | YES  | | NULL|   |
+--+--+--+-+-+---+
1 row in set (0.00 sec)


mysql create table t5 as select elt(1, 2.0,  false); 
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql describe t5;  
+-+--+--+-+-+---+
| Field   | Type | Null | Key | Default | Extra |
+-+--+--+-+-+---+
| elt(1, 2.0,  false) | varbinary(4) | YES  | | NULL|   |
+-+--+--+-+-+---+
1 row in set (0.00 sec)


Based on the above, I think it better return string as binary is commonly used  
in hive. 


 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.2.patch, HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

2009-07-16 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731858#action_12731858
 ] 

Min Zhou commented on HIVE-541:
---

all test cases passed on my side,  how's  yours?

 Implement UDFs: INSTR and LOCATE
 

 Key: HIVE-541
 URL: https://issues.apache.org/jira/browse/HIVE-541
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-541.1.patch, HIVE-541.2.patch


 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
 These functions can be directly implemented with Text (instead of String). 
 This will make the test of whether one string contains another string much 
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-515) [UDF] new string function INSTR(str,substr)

2009-07-16 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou resolved HIVE-515.
---

Resolution: Duplicate

duplicates [#HIVE-541]

 [UDF] new string function INSTR(str,substr)
 ---

 Key: HIVE-515
 URL: https://issues.apache.org/jira/browse/HIVE-515
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-515-2.patch, HIVE-515.patch


 UDF for string function INSTR(str,substr)
 This extends the function from MySQL
 http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_instr
 usage:
  INSTR(str, substr)
  INSTR(str, substr, start)
 example:
 {code:sql}
 select instr('abcd', 'abc') from pokes;  // all result are '1'
 select instr('abcabc', 'ccc') from pokes;  // all result are '0'
 select instr('abcabc', 'abc', 2) from pokes;  // all result are '4'
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-649) [UDF] now() for getting current time

2009-07-16 Thread Min Zhou (JIRA)
[UDF] now() for getting current time


 Key: HIVE-649
 URL: https://issues.apache.org/jira/browse/HIVE-649
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Min Zhou


http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

2009-07-15 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731764#action_12731764
 ] 

Min Zhou commented on HIVE-541:
---

hmm, It's may be a good way. I will try it soon. 

 Implement UDFs: INSTR and LOCATE
 

 Key: HIVE-541
 URL: https://issues.apache.org/jira/browse/HIVE-541
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-541.1.patch


 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
 These functions can be directly implemented with Text (instead of String). 
 This will make the test of whether one string contains another string much 
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE

2009-07-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-541:
--

Attachment: HIVE-541.2.patch

Added a GenericUDFUtils.findText() where string encoding and decoding is 
avoided, faster execution will be gained.  

 Implement UDFs: INSTR and LOCATE
 

 Key: HIVE-541
 URL: https://issues.apache.org/jira/browse/HIVE-541
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-541.1.patch, HIVE-541.2.patch


 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
 These functions can be directly implemented with Text (instead of String). 
 This will make the test of whether one string contains another string much 
 faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-329) start and stop hive thrift server in daemon mode

2009-07-13 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-329:
-

Assignee: Min Zhou

 start and stop hive thrift server  in daemon mode
 -

 Key: HIVE-329
 URL: https://issues.apache.org/jira/browse/HIVE-329
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: daemon.patch


 I write two shell script to start and stop hive thrift server more convenient.
 usage:
 bin/hive --service start-hive [HIVE_PORT]
 bin/hive --service stop-hive 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-12 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-555:
--

Attachment: HIVE-555-4.patch

Add a copy of UDAF to avoid 
[HIVE-620|http://issues.apache.org/jira/browse/HIVE-620] for passing all test 
cases.

 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch, 
 HIVE-555-4.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-618) More human-readable error prompt of FunctionTask

2009-07-09 Thread Min Zhou (JIRA)
More human-readable error prompt of FunctionTask


 Key: HIVE-618
 URL: https://issues.apache.org/jira/browse/HIVE-618
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Min Zhou


current prompt:
{noformat}
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.FunctionTask
{noformat}
Zheng suggested that somethings like below would be better
{noformat}
Class  not found
Class  does not implement UDF, GenericUDF, or UDAF
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)

2009-07-09 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-512 started by Min Zhou.

 [GenericUDF] new string function ELT(N,str1,str2,str3,...) 
 ---

 Key: HIVE-512
 URL: https://issues.apache.org/jira/browse/HIVE-512
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-512.patch


 ELT(N,str1,str2,str3,...)
 Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less 
 than 1 or greater than the number of arguments. ELT() is the complement of 
 FIELD().
 {noformat}
 mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
 - 'ej'
 mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
 - 'foo'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-08 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-555:
--

Attachment: HIVE-555-2.patch

with unit tests.

 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch, HIVE-555-2.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-08 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728989#action_12728989
 ] 

Min Zhou commented on HIVE-555:
---

1. I thought it would be a common function for generic udf  error prompt. 
2. It that required for an existing generic udf? but regardless whatever, i'll 
do it.


 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch, HIVE-555-2.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-08 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-555:
--

Attachment: HIVE-555-3.patch

patch followed namit's comments.

 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-08 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729009#action_12729009
 ] 

Min Zhou commented on HIVE-555:
---

@Zheng
It would involved some logic out of the FuctionTask. Actually , execute methods 
of  all  Task classes is  defined to return an integer stand for status code. 
So create another jira for that issue is better. Agree?


 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-329) start and stop hive thrift server in daemon mode

2009-07-08 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729012#action_12729012
 ] 

Min Zhou commented on HIVE-329:
---

start need a port number, but stop needn't.

 start and stop hive thrift server  in daemon mode
 -

 Key: HIVE-329
 URL: https://issues.apache.org/jira/browse/HIVE-329
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
 Attachments: daemon.patch


 I write two shell script to start and stop hive thrift server more convenient.
 usage:
 bin/hive --service start-hive [HIVE_PORT]
 bin/hive --service stop-hive 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-07-07 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727999#action_12727999
 ] 

Min Zhou commented on HIVE-537:
---

Zheng, how would you get field value from an object without a ordinal?


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-537.1.patch


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-07-05 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-537:
--

Attachment: HIVE-537.1.patch

HIVE-537.1.patch

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-537.1.patch


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-30 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725532#action_12725532
 ] 

Min Zhou commented on HIVE-537:
---

Even if UnionObjectInspector has been implemented,  the DynamicSerDe seems 
don't support  the schema with a union type  which thrift can't recoginze.
We must find a way solving it, any suggestions?  

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-30 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725564#action_12725564
 ] 

Min Zhou commented on HIVE-577:
---

Passed all testcase in hadoop 0.17.0 -0.19.1.   

 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch, HIVE-577.2.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-29 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-577:
--

Attachment: HIVE-577.1.patch

can retrieve all columns' comments now.

 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-29 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-577:
--

Attachment: HIVE-577.2.patch

@Prasad 
I considered that case you mentioned before uploaded a that patch,  just didn't 
know what is the meaning of code. 

this patch would cope the issue.

 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch, HIVE-577.2.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725450#action_12725450
 ] 

Min Zhou commented on HIVE-577:
---

I guessed it's cumbersome to deal with custom tables from current api provided 
by hive currently. 
ddl for schema should changed from 
  struct{ type1 col1, type2 col2}
to some format like
  struct{ struct{type1 col1, string comment1},  struct{type2 col2, string 
comment2}}

however, MetaStoreUtils.getDDLFromFieldSchema(structName,  fieldSchemas) is not 
only for getSchema(table). 



 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch, HIVE-577.2.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725450#action_12725450
 ] 

Min Zhou edited comment on HIVE-577 at 6/29/09 8:15 PM:


I guessed it's cumbersome to deal with custom tables from the api provided by 
hive currently. 

DDL for table schema should changed from 
  struct{ type1 col1, type2 col2}
to some format like
  struct{ struct{type1 col1, string comment1},  struct{type2 col2, string 
comment2}}

however, MetaStoreUtils.getDDLFromFieldSchema(structName,  fieldSchemas) is not 
only for getSchema(table). 



  was (Author: coderplay):
I guessed it's cumbersome to deal with custom tables from current api 
provided by hive currently. 
ddl for schema should changed from 
  struct{ type1 col1, type2 col2}
to some format like
  struct{ struct{type1 col1, string comment1},  struct{type2 col2, string 
comment2}}

however, MetaStoreUtils.getDDLFromFieldSchema(structName,  fieldSchemas) is not 
only for getSchema(table). 


  
 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch, HIVE-577.2.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-29 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725473#action_12725473
 ] 

Min Zhou commented on HIVE-577:
---

Any suggestions on this or accepting the 2nd patch, Prasad?

 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: HIVE-577.1.patch, HIVE-577.2.patch


 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-27 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724916#action_12724916
 ] 

Min Zhou commented on HIVE-537:
---

we've done a test about this issue, dataset: 700m records.

first approach, each distinct count needs 119 seconds, that's means 10 distinct 
count needs at least  1190 seconds.
second approach where distinct keys were distinguished by a tag,  10 distinct 
count need 148 seconds.

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-576) complete jdbc driver

2009-06-25 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724060#action_12724060
 ] 

Min Zhou commented on HIVE-576:
---

Dones  To Dos :

# removed all useless comments auto-gened by eclipse.
# added APL statements for each file
# fixed a bug SemanticAnalyzer.getSchema() fails after doing select-all queries 
on tables have partitions, where queries like select * from tbl where 
partition_name=value
# implemented HiveResultSetMetadata, HiveDatabaseMetadata
# HiveResultSet supported getXXX(columnName) now
# removed JdbcSessionState hasnot been used 
# supported SQL Explorer for manipulate hive data by GUI
# todo: implement HivePreparedStatement  HiveCallableStatement

 complete jdbc driver
 

 Key: HIVE-576
 URL: https://issues.apache.org/jira/browse/HIVE-576
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-576.1.patch, HIVE-576.2.patch, sqlexplorer.jpg


 hive only support a few interfaces of jdbc, let's complete it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-24 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: (was: tables.jpg)

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-24 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: (was: sqlexplorer.jpg)

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-576) complete jdbc driver

2009-06-24 Thread Min Zhou (JIRA)
complete jdbc driver


 Key: HIVE-576
 URL: https://issues.apache.org/jira/browse/HIVE-576
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


hive only support a few interfaces of jdbc, let's complete it. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-24 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723526#action_12723526
 ] 

Min Zhou commented on HIVE-567:
---

It's not elegant getting schema from hiveserver by the means of adding a 
function getFullDDLFromFieldSchema.

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-24 Thread Min Zhou (JIRA)
return correct comment of a column from ThriftHiveMetastore.Iface.get_fields


 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou


comment of each column hasnot been retrieved correct right now , 
FieldSchema.getComment() will return a string from derserializer.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-573) TestHiveServer broken

2009-06-24 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723841#action_12723841
 ] 

Min Zhou commented on HIVE-573:
---

it's a good way use json through Avro here, but making things more complex.  
serde(although is not a rpc), thrift, avro, 3 duplications of works.  

 TestHiveServer broken
 -

 Key: HIVE-573
 URL: https://issues.apache.org/jira/browse/HIVE-573
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Server Infrastructure
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-573.1.patch


 This was after the change to HIVE-567 was committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields

2009-06-24 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-577 started by Min Zhou.

 return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
 

 Key: HIVE-577
 URL: https://issues.apache.org/jira/browse/HIVE-577
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Min Zhou
Assignee: Min Zhou

 comment of each column hasnot been retrieved correct right now , 
 FieldSchema.getComment() will return a string from derserializer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-574) Hive should use ClassLoader from hadoop Configuration

2009-06-24 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723874#action_12723874
 ] 

Min Zhou commented on HIVE-574:
---

+1 for Zheng, thanks
It worked fine here, nothing abnormal.

 Hive should use ClassLoader from hadoop Configuration
 -

 Key: HIVE-574
 URL: https://issues.apache.org/jira/browse/HIVE-574
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-574.1.patch, HIVE-574.2.patch, HIVE-574.3.patch


 See HIVE-338.
 Hive should always use the getClassByName method from hadoop Configuration, 
 so that we choose the correct ClassLoader. Examples include all plug-in 
 interfaces, including UDF/GenericUDF/UDAF, SerDe, and FileFormats. Basically 
 the following code snippet shows the idea:
 {code}
 package org.apache.hadoop.conf;
 public class Configuration implements IterableMap.EntryString,String {
...
   /**
* Load a class by name.
* 
* @param name the class name.
* @return the class object.
* @throws ClassNotFoundException if the class is not found.
*/
   public Class? getClassByName(String name) throws ClassNotFoundException {
 return Class.forName(name, true, classLoader);
   }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

2009-06-24 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-559:
--

Issue Type: Sub-task  (was: New Feature)
Parent: HIVE-576

 Support JDBC ResultSetMetadata
 --

 Key: HIVE-559
 URL: https://issues.apache.org/jira/browse/HIVE-559
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Clients
Reporter: Bill Graham
Assignee: Min Zhou

 Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would 
 be particularly useful I'd expect:
 http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
 The challenge as I see it though, is that the JDBC client only has access to 
 the raw query string and the result data when running in standalone mode. 
 Therefore, it will need to get the column metadata one of two way: 
 1. By parsing the query to determine the tables/columns involved and then 
 making a request to the metastore to get the metadata for the columns. This 
 certainly feels like duplicate work, since the query of course gets properly 
 parsed on the server.
 2. By returning the column metadata from the server. My thrift knowledge is 
 limited, but I suspect adding this to the response would present other 
 challenges.
 Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-23 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: sqlexplorer.jpg

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, 
 sqlexplorer.jpg, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-338) Executing cli commands into thrift server

2009-06-23 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723400#action_12723400
 ] 

Min Zhou commented on HIVE-338:
---

Can you exlain why you made a change at FunctionTask .java? It caused a 
java.lang.ClassNotFoundException when I executing my udf. 
ClassLoader didnot work.

 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
Assignee: Zheng Shao
 Fix For: 0.4.0

 Attachments: hive-338.final.patch, HIVE-338.postfix.1.patch, 
 hiveserver-v1.patch, hiveserver-v2.patch, hiveserver-v3.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-338) Executing cli commands into thrift server

2009-06-23 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723400#action_12723400
 ] 

Min Zhou edited comment on HIVE-338 at 6/23/09 7:28 PM:


Can you exlain why you made a change at FunctionTask .java? It caused a 
java.lang.ClassNotFoundException when I executing my udf where mr jobs were 
submitted by hive cli. 
ClassLoader didnot work.

  was (Author: coderplay):
Can you exlain why you made a change at FunctionTask .java? It caused a 
java.lang.ClassNotFoundException when I executing my udf. 
ClassLoader didnot work.
  
 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
Assignee: Zheng Shao
 Fix For: 0.4.0

 Attachments: hive-338.final.patch, HIVE-338.postfix.1.patch, 
 hiveserver-v1.patch, hiveserver-v2.patch, hiveserver-v3.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: tables.jpg

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-567:
-

Assignee: Min Zhou  (was: Raghotham Murthy)

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: (was: result.jpg)

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Attachment: result.jpg

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-567:
-

Assignee: Raghotham Murthy  (was: Min Zhou)

incorrect manipulation

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2009-06-22 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-567:
--

Comment: was deleted

(was: incorrect manipulation)

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721169#action_12721169
 ] 

Min Zhou commented on HIVE-521:
---

I didn't  think all tests would pass due to the shortage of a class 
BinaryComparable. The reason why failing has nothing to do with this jira.  you 
can check out the trunk,and do 
ant -Dhadoop.version=0.17.0 test -Doverwrite=true
then error message will be displayed.
...
[junit] Exception: org/apache/hadoop/io/BinaryComparable
[junit] java.lang.NoClassDefFoundError: 
org/apache/hadoop/io/BinaryComparable
[junit] at java.lang.Class.getDeclaredConstructors0(Native Method)
[junit] at 
java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
[junit] at java.lang.Class.getConstructor0(Class.java:2699)
[junit] at java.lang.Class.newInstance0(Class.java:326)
[junit] at java.lang.Class.newInstance(Class.java:308)
[junit] at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getUDFMethod(FunctionRegistry.java:309)
[junit] at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDesc(TypeCheckProcFactory.java:451)
[junit] at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:558)
[junit] at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:653)
[junit] at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:80)
[junit] at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:83)
[junit] at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:116)
[junit] at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:95)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:3922)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1000)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:986)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3163)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:3610)
[junit] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3840)
[junit] at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
[junit] at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:44)
[junit] at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
[junit] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177)
[junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
[junit] at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:471)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity(TestCliDriver.java:726)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
[junit] Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.io.BinaryComparable
[junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
[junit] at 

[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721231#action_12721231
 ] 

Min Zhou commented on HIVE-521:
---

@ HIVE-521-all-v7.patch 

# {code:java}
boolean conditionTypeIsOk = (arguments[0].getCategory() == 
ObjectInspector.Category.PRIMITIVE);
if (conditionTypeIsOk) {
  PrimitiveObjectInspector poi = ((PrimitiveObjectInspector)arguments[0]);
  conditionTypeIsOk = (poi.getPrimitiveCategory() == 
PrimitiveObjectInspector.PrimitiveCategory.BOOLEAN
   || poi.getPrimitiveCategory() == 
PrimitiveObjectInspector.PrimitiveCategory.VOID);
}
if (!conditionTypeIsOk) {
  throw new UDFArgumentTypeException(0,
  The first argument of function IF should be \ + 
Constants.BOOLEAN_TYPE_NAME
  + \, but \ + arguments[0].getTypeName() + \ is found);
}
{code}
# {code:java}
String typeName = arguments[0].getTypeName();
if (!typeName.equals(Constants.BOOLEAN_TYPE_NAME)
|| !typeName.equals(Constants.VOID_TYPE_NAME)) {
  throw new UDFArgumentTypeException(0,
  The first expression of function IF is expected to \ + 
Constants.BOOLEAN_TYPE_NAME
  + \, but \ + arguments[0].getTypeName() + \ is found);
}
{code}

I though the 2nd approach is more concise, do you think so?

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, 
 HIVE-521-all-v6.patch, HIVE-521-all-v7.patch, HIVE-521-IF-2.patch, 
 HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, 
 HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-18 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721595#action_12721595
 ] 

Min Zhou commented on HIVE-521:
---

ok, we are hairsplitting. passed all tests here, let commit it .
+1

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, 
 HIVE-521-all-v6.patch, HIVE-521-all-v7.patch, HIVE-521-IF-2.patch, 
 HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, 
 HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-17 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-521:
--

Attachment: HIVE-521-all-v5.patch

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, 
 HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, 
 HIVE-521-IF-5.patch, HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-17 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-521:
--

Attachment: HIVE-521-all-v6.patch

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, 
 HIVE-521-all-v6.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, 
 HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-564) sweep the non-open source elements from hive

2009-06-16 Thread Min Zhou (JIRA)
sweep the non-open source elements from hive


 Key: HIVE-564
 URL: https://issues.apache.org/jira/browse/HIVE-564
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


There are some non-open source things from facebook in current version of Hive, 
we should replace them with an open-source version of fb303.jar, libthrift.jar, 
etc, this open-source community are more likely to amend the relevant code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-16 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-521:
--

Attachment: HIVE-521-all-v4.patch

passed tests on hadoop version 0.17.0.

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-IF-2.patch, 
 HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, 
 HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-338) Executing cli commands into thrift server

2009-06-16 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720462#action_12720462
 ] 

Min Zhou commented on HIVE-338:
---

I think you should take a look at these lines of 
org.apache.hadoop.conf.Configuration
{code:java}
  private ClassLoader classLoader;
  {
classLoader = Thread.currentThread().getContextClassLoader();
if (classLoader == null) {
  classLoader = Configuration.class.getClassLoader();
}
  }
...

  public Class? getClassByName(String name) throws ClassNotFoundException {
return Class.forName(name, true, classLoader);
  }
{code}

ClassLoader of current thread changed  when adding jars into ClassPath,  conf 
hasnot synchronously get that change. 

 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
Assignee: Min Zhou
 Attachments: hiveserver-v1.patch, hiveserver-v2.patch, 
 hiveserver-v3.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-556) let hive support theta join

2009-06-15 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719518#action_12719518
 ] 

Min Zhou commented on HIVE-556:
---

I didn't see any filter there,  hive will put all fields of my small table into 
HTree.

{noformat}
hiveexplain select /*+ MAPJOIN(a) */ a.url_pattern, w.url from application a 
join web_log w where w.logdate='20090611' and w.url rlike a.url_pattern and 
a.dt='20090609';

Common Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {bussiness_id} {subclass_id} {class_id} {note} {name} 
{url_pattern} {dt}
1
{noformat}

We only put a.url_pattern into a HashMap in our raw map-reduce implemenation.

 let hive support theta join
 ---

 Key: HIVE-556
 URL: https://issues.apache.org/jira/browse/HIVE-556
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


 Right now , hive only support equal joins .  Sometimes it's not enough, we 
 must consider implementing theta joins like
 {code:sql}
 SELECT
   a.subid, a.id, t.url
 FROM
   tbl t JOIN aux_tbl a ON t.url rlike a.url_pattern
 WHERE
   t.dt='20090609'
   AND a.dt='20090609';
 {code}
 any condition expression following 'ON' is  appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-559) Support JDBC ResultSetMetadata

2009-06-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-559:
-

Assignee: Min Zhou

 Support JDBC ResultSetMetadata
 --

 Key: HIVE-559
 URL: https://issues.apache.org/jira/browse/HIVE-559
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Clients
Reporter: Bill Graham
Assignee: Min Zhou

 Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would 
 be particularly useful I'd expect:
 http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
 The challenge as I see it though, is that the JDBC client only has access to 
 the raw query string and the result data when running in standalone mode. 
 Therefore, it will need to get the column metadata one of two way: 
 1. By parsing the query to determine the tables/columns involved and then 
 making a request to the metastore to get the metadata for the columns. This 
 certainly feels like duplicate work, since the query of course gets properly 
 parsed on the server.
 2. By returning the column metadata from the server. My thrift knowledge is 
 limited, but I suspect adding this to the response would present other 
 challenges.
 Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-474) Support for distinct selection on two or more columns

2009-06-14 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719368#action_12719368
 ] 

Min Zhou edited comment on HIVE-474 at 6/14/09 7:02 PM:


I thought there is another special case here.  If the query has multiple 
distinct operations on the same column , we can push down the evaluation of 
those expressions into reducers.
{code}
Query:
  select a, count(distinct if(codition, b, null)) as col1, count(distinct 
if(!condition, null, b)) as col2, count(distinct b) as col3

Plan:
  Job :
Map side:
  Emit: distribution_key: a, sort_key: a, b, value: nothing
Reduce side:
  Group By
a,  count col1, col2, col3 by evaluating their expressions
{code}

  was (Author: coderplay):
I thought there is another special case here.  If the query has multiple 
distinct operations on the same column , we can push down the evaluation of 
those expressions into reducers.

Query:
  select a, count(distinct if(codition, b, null)) as col1, count(distinct 
if(!condition, null, b)) as col2, count(distinct b) as col3

Plan:
  Job :
Map side:
  Emit: distribution_key: a, sort_key: a, b, value: nothing
Reduce side:
  Group By
a,  count col1, col2, col3 by evaluating their expressions
  
 Support for distinct selection on two or more columns
 -

 Key: HIVE-474
 URL: https://issues.apache.org/jira/browse/HIVE-474
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Alexis Rondeau

 The ability to select distinct several, individual columns as by example: 
 select count(distinct user), count(distinct session) from actions;   
 Currently returns the following failure: 
 FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
 not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2009-06-14 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719368#action_12719368
 ] 

Min Zhou commented on HIVE-474:
---

I thought there is another special case here.  If the query has multiple 
distinct operations on the same column , we can push down the evaluation of 
those expressions into reducers.

Query:
  select a, count(distinct if(codition, b, null)) as col1, count(distinct 
if(!condition, null, b)) as col2, count(distinct b) as col3

Plan:
  Job :
Map side:
  Emit: distribution_key: a, sort_key: a, b, value: nothing
Reduce side:
  Group By
a,  count col1, col2, col3 by evaluating their expressions

 Support for distinct selection on two or more columns
 -

 Key: HIVE-474
 URL: https://issues.apache.org/jira/browse/HIVE-474
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Alexis Rondeau

 The ability to select distinct several, individual columns as by example: 
 select count(distinct user), count(distinct session) from actions;   
 Currently returns the following failure: 
 FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
 not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-338) Executing cli commands into thrift server

2009-06-11 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717651#action_12717651
 ] 

Min Zhou edited comment on HIVE-338 at 6/10/09 11:37 PM:
-

* exec/FunctionTask.java: is it necessary to specify the loader in the 
Class.forName call? I thought that that the current thread context loader was 
the always the first loader to be tried anyway during name resolution.
Yes, of course. the class loader holding by HiveConf is older than that of 
current thread.

this pacth support dfs, add/delete file/jar, set now.  

btw, Joydeep, would you do me a favor writing some test code that I am not 
familiar with?  you know, ' add jar'  need a separate jar, and i not quite sure 
how to organize them.

  was (Author: coderplay):

* exec/FunctionTask.java: is it necessary to specify the loader in the 
Class.forName call? I thought that that the current thread context loader was 
the always the first loader to be tried anyway during name resolution.
Yes, of course. the class loader holding by HiveConf is older than that of 
current thread.

this pacth support dfs, add/delete file/jar, set now.  

btw, Joydeep, would you do me a favor writing some test code that I' am not 
familiar with it ?  you know, ' add jar'  need a separate jar, and i not quite 
sure how to organize them.
  
 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
 Attachments: hiveserver-v1.patch, hiveserver-v2.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-11 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718373#action_12718373
 ] 

Min Zhou commented on HIVE-537:
---

first approach:
  O(mN/p) + O(m(N/p log (N/p))) + O(mN/r) + O(m)
I don't agree with you about this O(m).  It would be indeed very large cost.  
and meanwhile,  you should adding the cost in the end joining all results into 
one. 

 for the second approach, I think it should be  
  O(N/p) + O(mN/p log (mN/p)) + O(mN/r)  

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-556) let hive support theta join

2009-06-11 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718687#action_12718687
 ] 

Min Zhou commented on HIVE-556:
---

it's very common for us, and blocked us badly. we ofen have one or more aux 
tables with about 10k records, which the major table would do theta joins on. I 
don't think current solution by the means of cartesian product is a good way, 
it would bring so terrible  sorting and i/o overhead to us.


 let hive support theta join
 ---

 Key: HIVE-556
 URL: https://issues.apache.org/jira/browse/HIVE-556
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


 Right now , hive only support equal joins .  Sometimes it's not enough, we 
 must consider implementing theta joins like
 {code:sql}
 SELECT
   a.subid, a.id, t.url
 FROM
   tbl t JOIN aux_tbl a ON t.url rlike a.url_pattern
 WHERE
   t.dt='20090609'
   AND a.dt='20090609';
 {code}
 any condition expression following 'ON' is  appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-556) let hive support theta join

2009-06-11 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718699#action_12718699
 ] 

Min Zhou commented on HIVE-556:
---

@Ashish
I agree with you, map-side joins is okay. however,  it doesnot support theta 
joins right now.  we used to load aux tables into the memory of each map node,  
scan major tables and do our joins. 

 let hive support theta join
 ---

 Key: HIVE-556
 URL: https://issues.apache.org/jira/browse/HIVE-556
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


 Right now , hive only support equal joins .  Sometimes it's not enough, we 
 must consider implementing theta joins like
 {code:sql}
 SELECT
   a.subid, a.id, t.url
 FROM
   tbl t JOIN aux_tbl a ON t.url rlike a.url_pattern
 WHERE
   t.dt='20090609'
   AND a.dt='20090609';
 {code}
 any condition expression following 'ON' is  appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-06-10 Thread Min Zhou (JIRA)
create temporary function support not only udf, but also udaf,  genericudf, etc.


 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0


Right now, command 'create temporary function' only support  udf. 
we can also let user write their udaf, generic udf, and write generic udaf in 
the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-06-10 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-555:
--

Attachment: HIVE-555-1.patch

patch w/o testcase

 create temporary function support not only udf, but also udaf,  genericudf, 
 etc.
 

 Key: HIVE-555
 URL: https://issues.apache.org/jira/browse/HIVE-555
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Min Zhou
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-555-1.patch


 Right now, command 'create temporary function' only support  udf. 
 we can also let user write their udaf, generic udf, and write generic udaf in 
 the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-10 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-521:
--

Attachment: HIVE-521-all-v2.patch

fixed issues commented by Zheng,  UDFArgumentException and 
UDFArgumentLengthException added.


 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, 
 HIVE-521-IF-5.patch, HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF

2009-06-10 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-521:
--

Attachment: HIVE-521-all-v3.patch

catch UDFArgumentLengthException.

 Move size, if, isnull, isnotnull to GenericUDF
 --

 Key: HIVE-521
 URL: https://issues.apache.org/jira/browse/HIVE-521
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Min Zhou
 Fix For: 0.4.0

 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, 
 HIVE-521-all-v3.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, 
 HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch


 See HIVE-511 for an example of the move.
 size, if, isnull, isnotnull are all implemented with UDF but they are 
 actually working on variable types of objects. We should move them to 
 GenericUDF for better type handling.
 This also helps to clean up the hack in doing type matching/type conversion 
 in UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-556) let hive support theta join

2009-06-10 Thread Min Zhou (JIRA)
let hive support theta join
---

 Key: HIVE-556
 URL: https://issues.apache.org/jira/browse/HIVE-556
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Min Zhou
 Fix For: 0.4.0


Right now , hive only support equal joins .  Somethings it's not enough, we 
must consider implementing theta joins like

{code:sql}
SELECT
  a.subid, a.id, t.url
FROM
  tbl t JOIN aux_tbl a ON t.url rlike a.url_pattern
WHERE
  t.dt='20090609'
  AND a.dt='20090609';
{code}

any condition expression following 'ON' is  appropriate.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-338) Executing cli commands into thrift server

2009-06-09 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-338:
--

Attachment: hiveserver-v2.patch


* exec/FunctionTask.java: is it necessary to specify the loader in the 
Class.forName call? I thought that that the current thread context loader was 
the always the first loader to be tried anyway during name resolution.
Yes, of course. the class loader holding by HiveConf is older than that of 
current thread.

this pacth support dfs, add/delete file/jar, set now.  

btw, Joydeep, would you do me a favor writing some test code that I' am not 
familiar with it ?  you know, ' add jar'  need a separate jar, and i not quite 
sure how to organize them.

 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
 Attachments: hiveserver-v1.patch, hiveserver-v2.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-338) Executing cli commands into thrift server

2009-06-09 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-338:
--

Attachment: hiveserver-v2.patch

oops, made a mistake when uploading former one. 

 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
 Attachments: hiveserver-v1.patch, hiveserver-v2.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-338) Executing cli commands into thrift server

2009-06-09 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated HIVE-338:
--

Attachment: (was: hiveserver-v2.patch)

 Executing cli commands into thrift server
 -

 Key: HIVE-338
 URL: https://issues.apache.org/jira/browse/HIVE-338
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Affects Versions: 0.3.0
Reporter: Min Zhou
 Attachments: hiveserver-v1.patch, hiveserver-v2.patch


 Let thrift server support set, add/delete file/jar and normal HSQL query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >