[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913895#action_12913895
 ] 

He Yongqiang commented on HIVE-1624:


For 2, sometimes it is actually a common case. For example, User can use php 
but no need to have php program in local. We can add some simple rule for 
downloading resource files, such as starts with s3 schema in this case.  

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913886#action_12913886
 ] 

John Sichi commented on HIVE-1264:
--

Mirror is up; Todd, could you test it and then update the patch with the 
location?

http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-0.20.3-CDH3-SNAPSHOT


> Make Hive work with Hadoop security
> ---
>
> Key: HIVE-1264
> URL: https://issues.apache.org/jira/browse/HIVE-1264
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Jeff Hammerbacher
>Assignee: Todd Lipcon
> Attachments: hive-1264.txt, HiveHadoop20S_patch.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913881#action_12913881
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


I will remove some of the unnecessary log statements.

The patch consists of two parts:

1. It extends the add file/jar functionality to download remote files. I think 
it makes sense as it is as the user is expected to have access to the file from 
the client location. In case that is not true, the patch will fail with an 
IOException which will notify the user of the problem appropriately.

2. It eliminates the need for user to run an extra add file command. I think 
that the current norm in Hive is for the user to execute the 'add file ' 
command to add a resource before using it in the transform function. This patch 
allows user to directly specify a resource instead of doing it in two steps.

The patch does not attempt to address the case when the script exists on a 
remote location not accessible to the client.

Please let me know what you think.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913880#action_12913880
 ] 

Namit Jain commented on HIVE-1665:
--

By default, the scratch dir can be based on date etc. so that it can be easily 
cleaned up

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913878#action_12913878
 ] 

He Yongqiang commented on HIVE-1624:


looks good basically. need to remove some unneeded logging information

one main problem here is to determine when to download file. We can not simply 
try downloading file when can not be found in local. 
Sometimes scripts exist in some remote dir that the hadoop cluster nodes can 
access but the client can not.

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1665) drop operations may cause file leak

2010-09-22 Thread He Yongqiang (JIRA)
drop operations may cause file leak
---

 Key: HIVE-1665
 URL: https://issues.apache.org/jira/browse/HIVE-1665
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


Right now when doing a drop, Hive first drops metadata and then drops the 
actual files. If file system is down at that time, the files will keep not 
deleted. 

Had an offline discussion about this:
to fix this, add a new conf "scratch dir" into hive conf. 
when doing a drop operation:
1) move data to scratch directory
2) drop metadata
3) if 2) failed, roll back 1) and report error 3.1
if 2) succeeded, drop data from scratch directory 3.2
4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
manually.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913875#action_12913875
 ] 

Vaibhav Aggarwal commented on HIVE-1624:


Hi

I have attached a new patch.
It uses the add_resource functionality to make the script available to all 
nodes instead of downloading the script on each node.

Thanks
Vaibhav

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread Vaibhav Aggarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Aggarwal updated HIVE-1624:
---

Attachment: HIVE-1624-2.patch

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.4.java_only.patch
HIVE-1361.4.patch

Uploading new patch that refreshed to the latest trunk. Also added a negative 
test case analyze.q and some trivial clean up in Java code (removing commented 
out contents). 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1664) Eclipse build broken

2010-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1664:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Steven!


> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1664.classpath.2.patch, HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913865#action_12913865
 ] 

John Sichi commented on HIVE-474:
-

Some comments after a brief look:

* The patch is going to need to be rebased against trunk (I guess after 
HIVE-537 is committed)?

* We should make sure that in the case of a single distinct agg, we leave the 
plan as it is today, and only use the new plan generation when multiple 
distincts are present.  This may already be the case; I couldn't quite tell 
from the example plans in the test cases (it would be nice to have some simpler 
queries for that).

* Regarding moving expression evaluation to the reduce side:  in general, this 
is something that needs cost-based optimization, due to factors like (a) data 
size before and after expression evaluation and (b) parallelization benefit of 
spreading out the computation over lots of mappers (assuming many more mappers 
than reducers).


> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Mafish
> Attachments: hive-474.0.4.2rc.patch
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1664) Eclipse build broken

2010-09-22 Thread Steven Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-1664:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.7.0

> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1664.classpath.2.patch, HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1664) Eclipse build broken

2010-09-22 Thread Steven Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-1664:
--

Attachment: HIVE-1664.classpath.2.patch

Here you go.

> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Attachments: HIVE-1664.classpath.2.patch, HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1664) Eclipse build broken

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913845#action_12913845
 ] 

John Sichi commented on HIVE-1664:
--

(Or Ashutosh, if you can give me one:  we maintain these in svn in the 
eclipse-templates subdir.)


> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Attachments: HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913843#action_12913843
 ] 

Ning Zhang commented on HIVE-1526:
--

The Hive ODBC code is dependent on Thrift as well. In particular the hive 
client and unixODBC libraries have to be linked with the new libthrift.so. Can 
you test if the ODBC code is compatible with the new thrift version?

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Attachments: hive-1526.txt, libfb303.jar, libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1664) Eclipse build broken

2010-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1664:


Assignee: Steven Wong

Steven, can you upload another patch with Ashutosh's suggestion?  I verified 
that it works already.

> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Attachments: HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1664) Eclipse build broken

2010-09-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913840#action_12913840
 ] 

Ashutosh Chauhan commented on HIVE-1664:


You need to add build/metastore/gen-java as one of the source folder in Java 
build path of your project.  

> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
> Attachments: HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (HIVE-1609) Support partition filtering in metastore

2010-09-22 Thread Steven Wong
John,

I patched .classpath, but the build still fails, with other errors. Please look 
at https://issues.apache.org/jira/browse/HIVE-1664.

Steven


-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Tuesday, September 21, 2010 7:04 PM
To: 
Cc: Steven Wong
Subject: Re: [jira] Updated: (HIVE-1609) Support partition filtering in 
metastore

Oops, yeah, this happened the last time we upgraded datanucleus also (see 
HIVE-1373 where we fixed it).  If someone posts a patch which fixes .classpath, 
I'll commit it.

JVS

On Sep 21, 2010, at 6:51 PM, Steven Wong wrote:

> Did this check-in break the Eclipse build?
> 
> 
> -Original Message-
> From: John Sichi (JIRA) [mailto:j...@apache.org] 
> Sent: Tuesday, September 21, 2010 2:16 PM
> To: hive-dev@hadoop.apache.org
> Subject: [jira] Updated: (HIVE-1609) Support partition filtering in metastore
> 
> 
> [ 
> https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> 
> John Sichi updated HIVE-1609:
> -
> 
>  Status: Resolved  (was: Patch Available)
>Hadoop Flags: [Reviewed]
>  Resolution: Fixed
> 
> Committed.  Thanks Ajay!
> 
> 
>> Support partition filtering in metastore
>> 
>> 
>>Key: HIVE-1609
>>URL: https://issues.apache.org/jira/browse/HIVE-1609
>>Project: Hadoop Hive
>> Issue Type: New Feature
>> Components: Metastore
>>   Reporter: Ajay Kidave
>>   Assignee: Ajay Kidave
>>Fix For: 0.7.0
>> 
>>Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch
>> 
>> 
>> The metastore needs to have support for returning a list of partitions based 
>> on user specified filter conditions. This will be useful for tools which 
>> need to do partition pruning. Howl is one such use case. The way partition 
>> pruning is done during hive query execution need not be changed.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 




[jira] Updated: (HIVE-1664) Eclipse build broken

2010-09-22 Thread Steven Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-1664:
--

Attachment: HIVE-1664.classpath.patch

The attached HIVE-1664.classpath.patch fixes the .classpath problems. But the 
build remains broken with these errors that I have little clue about:

Description ResourcePathLocationType
FilterLexer cannot be resolved to a typeObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 993
Java Problem
FilterParser cannot be resolved to a type   ObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 998
Java Problem
FilterParser cannot be resolved to a type   ObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 998
Java Problem
The import org.apache.hadoop.hive.metastore.parser.FilterLexer cannot be 
resolved   ObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 73 Java 
Problem
The import org.apache.hadoop.hive.metastore.parser.FilterParser cannot be 
resolved  ObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 74 Java 
Problem
FilterLexer cannot be resolved to a typeObjectStore.java
/trunk/metastore/src/java/org/apache/hadoop/hive/metastore  line 993
Java Problem


> Eclipse build broken
> 
>
> Key: HIVE-1664
> URL: https://issues.apache.org/jira/browse/HIVE-1664
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.7.0
>Reporter: Steven Wong
> Attachments: HIVE-1664.classpath.patch
>
>
> After updating trunk to r999644, Eclipse build is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1664) Eclipse build broken

2010-09-22 Thread Steven Wong (JIRA)
Eclipse build broken


 Key: HIVE-1664
 URL: https://issues.apache.org/jira/browse/HIVE-1664
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.7.0
Reporter: Steven Wong


After updating trunk to r999644, Eclipse build is broken.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1526:
--

Attachment: hive-1526.txt
libthrift.jar
libfb303.jar

Here is a patch along with the newly built jars from Thrift 0.4.0.

I agree that long term we should make codegen part of the build, but I think 
it's enough of a hassle to require everyone to install the same version of 
thrift, we should punt for now.

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Attachments: hive-1526.txt, libfb303.jar, libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.3.patch

Updated HIVE-1361.3.patch.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: (was: HIVE-1361.3.patch)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1661) Default values for parameters

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913791#action_12913791
 ] 

He Yongqiang commented on HIVE-1661:


+1, looks good.


> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913787#action_12913787
 ] 

Todd Lipcon commented on HIVE-842:
--

I don't anticipate breaking the web UI (or anything) on non-secure Hadoop 
versions. But it will probably be insecure to run the web UI, which currently 
trusts users to say who they want to be - i.e I don't plan in the short term to 
integrate an auth layer for the web UI itself.

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913783#action_12913783
 ] 

Namit Jain commented on HIVE-1361:
--

Ning, the latest patch contains the output of svn stat

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Mirror for datanucleus?

2010-09-22 Thread John Sichi

On Sep 20, 2010, at 7:43 PM, Todd Lipcon wrote:

> Anyone else noticed that the datanucleus repository is going super
> slow today? Any chance we can get a mirror up for that one?


Worked OK for me today.  It would be nice to set up some kind of (public) 
automatic caching proxy.

JVS



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913741#action_12913741
 ] 

Edward Capriolo commented on HIVE-842:
--

By attack the Web UI separately what is meant? Will it be broken or 
non-functional at any phase here? That is what I find happens often, some of it 
is really the WUI's fault for using JSP and not servlets, but there is no 
simple way to code cover the wui and all the different ways its gets broken. 

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1496) enhance CREATE INDEX to support immediate index build

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913737#action_12913737
 ] 

John Sichi commented on HIVE-1496:
--

The implementation for this will need to chain together a task to do the actual 
index building together with a task to do the metastore update.  It should be 
similar to CREATE TABLE AS SELECT (which both creates the table definition in 
the metastore and does the equivalent of an INSERT to populate it with the 
SELECT results).

Use "EXPLAIN CREATE TABLE p AS SELECT * FROM pokes;" to see the combined plan.  
And see the end of SemanticAnalyzer.genMapRedTasks for where it chains the 
tasks together.

{noformat}
if (qb.isCTAS()) {
  // generate a DDL task and make it a dependent task of the leaf
  ...
{noformat}

For immediate index build, we want to combine the existing CREATE INDEX with 
ALTER INDEX REBUILD.  One hiccup may be that the rebuild already wants the 
index to be defined in the metastore, whereas for CREATE TABLE AS SELECT we do 
it in the opposite order (only populating the metastore after the data is 
successfully loaded).  It may be acceptable to just make the CREATE INDEX 
non-atomic (i.e. populate the metastore first, and if the rebuild fails, we 
leave the index empty; the user can retry with ALTER INDEX REBUILD, same as if 
it had been deferred in the first place).

Ning Zhang (nzhang at facebook dot com) did the CREATE TABLE AS SELECT 
implementation, so he may be able to provide help if you run into trouble with 
this one.


> enhance CREATE INDEX to support immediate index build
> -
>
> Key: HIVE-1496
> URL: https://issues.apache.org/jira/browse/HIVE-1496
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> Currently we only support WITH DEFERRED REBUILD.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913731#action_12913731
 ] 

John Sichi edited comment on HIVE-1501 at 9/22/10 3:22 PM:
---

Russell, please reassign to the actual owner.

  was (Author: jvs):
Russell, please reasssign to the actual owner.
  
> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1496) enhance CREATE INDEX to support immediate index build

2010-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1496:


Assignee: Russell Melick  (was: He Yongqiang)

Russell, please reassign to the actual owner.


> enhance CREATE INDEX to support immediate index build
> -
>
> Key: HIVE-1496
> URL: https://issues.apache.org/jira/browse/HIVE-1496
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> Currently we only support WITH DEFERRED REBUILD.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1501:


Assignee: Russell Melick  (was: He Yongqiang)

Russell, please reasssign to the actual owner.

> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913729#action_12913729
 ] 

John Sichi commented on HIVE-1497:
--

For the implementation, it should be possible to follow the pattern for the 
existing SHOW PARTITIONS command.

* showStatement in Hive.g
* DDLSemanticAnalyzer.analyzeShowPartitions
* ShowPartitionsDesc
* DDLTask.showPartitions; for returning multiple fields per output row, use 
out.write(separator)
* Hive.java does not currently have a getIndexes method, so you'll need to add 
that.  But IMetaStoreClient does have a listIndexes method already, so you can 
call that.


> support COMMENT clause on CREATE INDEX, and add new commands for 
> SHOW/DESCRIBE indexes
> --
>
> Key: HIVE-1497
> URL: https://issues.apache.org/jira/browse/HIVE-1497
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into 
> account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1661) Default values for parameters

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913726#action_12913726
 ] 

Namit Jain commented on HIVE-1661:
--

Yongqiang, can you take a look at this ?

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-22 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913725#action_12913725
 ] 

HBase Review Board commented on HIVE-1378:
--

Message from: "Ning Zhang" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/828/#review1296
---



trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java


constructing an ArrayList for every row is very expensive. Do you need a 
separate copy every row or can share a "cache" among rows?



trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java


Removing this changes the behavior: previously it throws an exception and 
schema mismatch, not it tolerate it. It would be good to remain backward 
compatibility.


- Ning





> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913723#action_12913723
 ] 

John Sichi commented on HIVE-1501:
--

Example:

create table `_t`(`_i` int, `_j` int);
create index x on table `_t`(`_j`) as 'compact' with deferred rebuild;
alter index x on `_t` rebuild;

gives

FAILED: Parse Error: line 1:48 mismatched input ',' expecting CharSetLiteral in 
character string literal

To see why, look at 
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler, method 
getIndexBuilderMapRedTask.  It constructs an internal SQL statement (INSERT) 
which populates the index table structure.  However, it neglects to quote the 
table/column names, leading to invalid syntax.  (Hive uses backticks to quote 
identifiers with special characters--I think this currently only applies to 
leading underscores, but later we'll support arbitrary identifiers.)

HiveUtils.unparseIdentifier should be used for quoting.


> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1661:
--

Attachment: HIVE-1661.2.patch

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1661:
--

Attachment: (was: HIVE-1661.2.patch)

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1661:
--

Attachment: HIVE-1661.2.patch

Clear the codes modified by Eclipse

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1661:
--

Status: Patch Available  (was: Open)

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-22 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1661:
--

Attachment: HIVE-1661.1.patch

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Venkatesh S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913706#action_12913706
 ] 

Venkatesh S commented on HIVE-842:
--

Sounds good to me.




> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913691#action_12913691
 ] 

Todd Lipcon commented on HIVE-842:
--

OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be 
too hard to implement. If I recall correctly, though, the delegation tokens 
rely on a secret key that the master daemon periodically rotates. We need to 
add some kind of persistent token storage for this to work - I guess in the 
metastore's DB?

To make this easier to review, I'd like to do the straight kerberos first, and 
then add delegation tokens in a second patch/JIRA. Sound good?

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-22 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913670#action_12913670
 ] 

Zheng Shao commented on HIVE-537:
-

{code}
union create_union(byte tag, T0 o0, T1 o1, T2 o2, ...)
Some real examples:
union create_union( is_student ? 0 : 1, school, company)
{code}

Depending on the value of the tag, the returned union object will choose to 
store only the object corresponding to that tag.


> Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
> map)
> ---
>
> Key: HIVE-537
> URL: https://issues.apache.org/jira/browse/HIVE-537
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt
>
>
> There are already some cases inside the code that we use heterogeneous data: 
> JoinOperator, and UnionOperator (in the sense that different parents can pass 
> in records with different ObjectInspectors).
> We currently use Operator's parentID to distinguish that. However that 
> approach does not extend to more complex plans that might be needed in the 
> future.
> We will support the union type like this:
> {code}
> TypeDefinition:
>   type: primitivetype | structtype | arraytype | maptype | uniontype
>   uniontype: "union" "<" tag ":" type ("," tag ":" type)* ">"
> Example:
>   union<0:int,1:double,2:array,3:struct>
> Example of serialized data format:
>   We will first store the tag byte before we serialize the object. On 
> deserialization, we will first read out the tag byte, then we know what is 
> the current type of the following object, so we can deserialize it 
> successfully.
> Interface for ObjectInspector:
> interface UnionObjectInspector {
>   /** Returns the array of OIs that are for each of the tags
>*/
>   ObjectInspector[] getObjectInspectors();
>   /** Return the tag of the object.
>*/
>   byte getTag(Object o);
>   /** Return the field based on the tag value associated with the Object.
>*/
>   Object getField(Object o);
> };
> An example serialization format (Using deliminated format, with ' ' as 
> first-level delimitor and '=' as second-level delimitor)
> userid:int,log:union<0:struct>,1:string>
> 123 1=login
> 123 0=243=helloworld
> 123 1=logout
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #547

2010-09-22 Thread Apache Hudson Server
See 

Changes:

[namit] Add TestRemoteHiveMetastor removed by mistake

[namit] HIVE-1655.  Adding consistency check at jobClose() when committing 
dynamic
partitions (Ning Zhang via namit)

[jvs] HIVE-1609. Support partition filtering in metastore
(Ajay Kidave via jvs)

[namit] HIVE-1534. Join filters do not work correctly with outer joins
(Amareshwari Sriramadasu via namit)

--
[...truncated 3677 lines...]
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDAFResolver.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/package-info.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotNull.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NGramEstimator.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLocate.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCase.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/Collector.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInstr.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFWhen.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseBitOP.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLn.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPLessThan.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseCompare.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFloor.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLog10.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTan.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFindInSet.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java

[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913668#action_12913668
 ] 

Ning Zhang commented on HIVE-1658:
--

+1 on keeping the old format but add a "pretty operator" as the child of the 
explain, so that the execution plan for the EXPLAIN is an explain operator 
(with the old formatting) followed by an optional "pretty operator" taking the 
output and do further formatting. 

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1599) optimize mapjoin to use distributedcache

2010-09-22 Thread Jacob Rideout (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913664#action_12913664
 ] 

Jacob Rideout commented on HIVE-1599:
-

Additionally, if jvm reuse in enabled the mappers run within the same jvm can 
reuse an in memory (static?) copy of the data. When we implement map joins (in 
a non-hive java map-reduce job) and have jvm reuse enabled, we've seen 
significant performance improvements with many maps. 

> optimize mapjoin to use distributedcache
> 
>
> Key: HIVE-1599
> URL: https://issues.apache.org/jira/browse/HIVE-1599
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
> Fix For: 0.7.0
>
>
> Currently, each mapper reads the file locally in case of a mapjoin. This 
> creates problems if the number
> of mappers is very high.
> It would be optimal to put the files in the distributedcache before the job 
> starts, and then the mappers
> can read it from the cache instead of reading from hdfs as they do currently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913662#action_12913662
 ] 

Namit Jain commented on HIVE-1658:
--

@Thiruvel, can we keep the new output in the old format.
I mean, we just have to make sure that the output has 3 columns separated by a 
delimiter.

So, if your current output is 'x', you can replace it with:

x

An implicit null at the beginning and end.




> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.3.patch

Uploading HIVE-1361.3.patch which passes all tests on hadoop 0.20 &0.17. The 
only difference from the last patch is the log change in stats2.q.out.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reassigned HIVE-1633:
-

Assignee: Amareshwari Sriramadasu

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913475#action_12913475
 ] 

Amareshwari Sriramadasu commented on HIVE-537:
--

Zheng, Can you give an example usage of union type as UDF? I looked at Struct, 
Map and array UDFs, but Union is quiet different from them because it holds 
only one object at any point of time.

> Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
> map)
> ---
>
> Key: HIVE-537
> URL: https://issues.apache.org/jira/browse/HIVE-537
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt
>
>
> There are already some cases inside the code that we use heterogeneous data: 
> JoinOperator, and UnionOperator (in the sense that different parents can pass 
> in records with different ObjectInspectors).
> We currently use Operator's parentID to distinguish that. However that 
> approach does not extend to more complex plans that might be needed in the 
> future.
> We will support the union type like this:
> {code}
> TypeDefinition:
>   type: primitivetype | structtype | arraytype | maptype | uniontype
>   uniontype: "union" "<" tag ":" type ("," tag ":" type)* ">"
> Example:
>   union<0:int,1:double,2:array,3:struct>
> Example of serialized data format:
>   We will first store the tag byte before we serialize the object. On 
> deserialization, we will first read out the tag byte, then we know what is 
> the current type of the following object, so we can deserialize it 
> successfully.
> Interface for ObjectInspector:
> interface UnionObjectInspector {
>   /** Returns the array of OIs that are for each of the tags
>*/
>   ObjectInspector[] getObjectInspectors();
>   /** Return the tag of the object.
>*/
>   byte getTag(Object o);
>   /** Return the field based on the tag value associated with the Object.
>*/
>   Object getField(Object o);
> };
> An example serialization format (Using deliminated format, with ' ' as 
> first-level delimitor and '=' as second-level delimitor)
> userid:int,log:union<0:struct>,1:string>
> 123 1=login
> 123 0=243=helloworld
> 123 1=logout
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Venkatesh S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913466#action_12913466
 ] 

Venkatesh S commented on HIVE-842:
--

> *  Do Hive tasks ever need to authenticate to the metastore? If so, we 
> will have to build a delegation token system into Hive.
I learnt it from Alan and Pradeep that Howl uses the commit task to talk to the 
metastore. Hence we'll have to build the delegation token system.

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913439#action_12913439
 ] 

Todd Lipcon commented on HIVE-842:
--

As discussed at the last contributor meeting, I am working on authenticating 
access to the metastore by kerberizing the Thrift interface.

Plan is currently:
1) Update the version of Thrift in Hive to 0.4.0
2) Temporarily check in the SASL support from Thrift trunk (this will be in 
0.5.0 release, due out in October some time)
3) Build a bridge between Thrift's SASL support and Hadoop's 
UserGroupInformation classes. Thus, if a user has a current UGI on the client 
side, it will get propagated to the JAAS context on the handler side.
4) In places where the metastore accesses the file system, use the "proxy user" 
functionality to act on behalf of the authenticated user.
5) When we detect that we are running on secure hadoop with security enabled, 
enable the above functionality.

I'd like to attack the Hive Web UI separately.

One open question:
- Do Hive *tasks* ever need to authenticate to the metastore? If so, we will 
have to build a delegation token system into Hive.

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-842:


Assignee: Todd Lipcon

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.