[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1361:
-

Status: Open  (was: Patch Available)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910903#action_12910903
 ] 

Namit Jain commented on HIVE-1617:
--

Committed. Thanks Paul

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910900#action_12910900
 ] 

HBase Review Board commented on HIVE-1378:
--

Message from: "Ning Zhang" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/828/#review1269
---


Looks good in general. Some minor comments. I'll also ask Zheng to review it. 


trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


Can you also add this parameter to conf/hive-default.xml? This file should 
contain all user-tunable parameters in HiveConf.java.



trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java


should null be checked first?



trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java


check null first?


- Ning





> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910899#action_12910899
 ] 

Ning Zhang commented on HIVE-1378:
--

Looks good in general. I've left some minor comments in the cloudera's review 
board. I'm not sure if it could be replicated here, but if not, I'll copy them 
manually.

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1656) All TestJdbcDriver test cases fail in Eclipse unless a property is added in run config

2010-09-17 Thread Steven Wong (JIRA)
All TestJdbcDriver test cases fail in Eclipse unless a property is added in run 
config
--

 Key: HIVE-1656
 URL: https://issues.apache.org/jira/browse/HIVE-1656
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.7.0
Reporter: Steven Wong


All TestJdbcDriver test cases fail in Eclipse, unless I add the following 
property in the TestJdbc run configuration ("Arguments" tab --> "VM arguments" 
box):

-Dtest.warehouse.dir="${workspace_loc:trunk}/build/ql/test/data/warehouse"


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Attachment: HIVE-1655.patch

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Status: Patch Available  (was: Open)

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-17 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910886#action_12910886
 ] 

Vaibhav Aggarwal commented on HIVE-1620:


I think that is a good suggestion.
I will try to break down the functionality into a separate class and will 
resubmit the patch.

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910880#action_12910880
 ] 

Ning Zhang commented on HIVE-1655:
--

Actually the _tmp files are taken care of by FSPaths.commit() called at 
FileSinkOperator.close() and any missed _tmp* files are removed in jobClose() 
-> Utilities.removeTempOrDuplicateFiles(). The only missing piece is the remove 
the empty directories at jobClose().

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1226) support filter pushdown against non-native tables

2010-09-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1226:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks John!

> support filter pushdown against non-native tables
> -
>
> Key: HIVE-1226
> URL: https://issues.apache.org/jira/browse/HIVE-1226
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler, Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1226.1.patch, HIVE-1226.2.patch, HIVE-1226.3.patch, 
> HIVE-1226.4.patch
>
>
> For example, HBase's scan object can take filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE/ALTER INDEX

2010-09-17 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1498:
-

Attachment: hive-1498.prelim.patch

This patch is very preliminary, and we're working on getting the testing 
environment working, so we can't verify that it passes the tests.

> support IDXPROPERTIES on CREATE/ALTER INDEX
> ---
>
> Key: HIVE-1498
> URL: https://issues.apache.org/jira/browse/HIVE-1498
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1498.prelim.patch
>
>
> It's partially there in the grammar but not hooked in; should work pretty 
> much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910851#action_12910851
 ] 

HBase Review Board commented on HIVE-1361:
--

Message from: "namit jain" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/862/#review1264
---



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java


This code seems useless



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java


How are you accounting for speculative 
execution ?

Can 2 tasks insert the entry ?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


It might be a good idea to make it easy 
to add new stats. Right now, you will need
to fix code in multiple places.

Instead of hard-coding nRowsInTable, it
would be good to keep an array of stats
we are publishing in a central place



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


This (addOutputs()) should be done at 
compile time



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java


Most of these parameters need not be 
instance variables - have a new function
where these are defined



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java


Can you add publishStats in Utilities and
let TableScan and FileSink share it



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


I am assuming these red blocks mean TABs



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java


??



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java


Do we need to lock the row ?
use a SELECT FOR UPDATE instead of
SELECT


- namit





> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)
Adding consistency check at jobClose() when committing dynamic partitions
-

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang


In case of dynamic partition insert, FileSinkOperator generated a directory for 
a new partition and the files in the directory is named with '_tmp*'. When a 
task succeed, the file is renamed to remove the "_tmp", which essentially 
implement the "commit" semantics. A lot of exceptions could happen (process got 
killed, machine dies etc.) could left the _tmp files exist in the DP directory. 
These _tmp files should be deleted ("rolled back") at successful jobClose(). 
After the deletion, we should also delete any empty directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-17 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910839#action_12910839
 ] 

Paul Yang commented on HIVE-1617:
-

I've tried creating a test for this but it seems like for local mode, the task 
timeouts don't work. Unless we change how hadoop's local mode works, we might 
have to go without a test case.

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910837#action_12910837
 ] 

Joydeep Sen Sarma commented on HIVE-1651:
-

yeah - but then the directory itself should be created as a tmp directory. and 
we should promote the directory to it's final name only when closing 
successfully.

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910834#action_12910834
 ] 

Ning Zhang commented on HIVE-1651:
--

@joydeep, the output file will not be committed if an exception occurred and 
close(abort=true) is called. This bug happened in a short time window after the 
exception occurred and before the close(abort) is called. Although the file got 
deleted, the dynamic partition insert already created a directory which later 
will be considered as an empty partition. 

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1654) select distinct should allow column name regex

2010-09-17 Thread John Sichi (JIRA)
select distinct should allow column name regex
--

 Key: HIVE-1654
 URL: https://issues.apache.org/jira/browse/HIVE-1654
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0


This works (matching column name foo):

select `fo.*` from pokes;

but this

select distinct `fo.*` from pokes;

gives

FAILED: Error in semantic analysis: line 1:16 Invalid Table Alias or Column 
Reference `fo.*`

It should work consistently.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1517) ability to select across a database

2010-09-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1517:
-

Attachment: HIVE-1517.1.patch.txt

> ability to select across a database
> ---
>
> Key: HIVE-1517
> URL: https://issues.apache.org/jira/browse/HIVE-1517
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1517.1.patch.txt
>
>
> After  https://issues.apache.org/jira/browse/HIVE-675, we need a way to be 
> able to select across a database for this feature to be useful.
> For eg:
> use db1
> create table foo();
> use db2
> select .. from db1.foo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910813#action_12910813
 ] 

HBase Review Board commented on HIVE-1361:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/862/#review1262
---



trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


Hive conf additions should be accompanied by new entries in 
conf/hive-default.xml for documentation purposes.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


Using e.toString() alone here may lose some of the diagnostics.

LOG.error has an overload which takes a Throwable parameter; use that to 
make sure that all the diagnostics (e.g. nested throwables) are logged.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


As a performance followup, we probably want to use delete(List) for 
batching.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java


See perf comment above.  Also, this scan+delete code could be shared to 
avoid duplication.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java


See comments in HBaseStatsAggregator regarding diagnostics.




trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java


Another perf note:  for batch update, we can use setAutoFlush(false) and 
then flushCommits in closeConnection.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java


Probably need a followup to make this configurable.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java


What is this code for?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


Isn't this going to throw an NPE if aggregateStats returns null after 
handling an error?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java


s/retur/return/



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java


typo:  MapRedTaks



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Some more overview (or just link to updated wiki doc) would be good here 
since the methods below reference things like temporary stats and aggregation 
without really explaining them.

Also:  I think having the publisher/aggregator implementations catch errors 
themselves is confusing.  It would be cleaner to let them propagate the 
exceptions, and instead catch+suppress+warn in the calling code (under control 
of a strictness config param).




trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


@param, @return?



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Use correct Javadoc @param syntax, and add @return.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java


Use @param, @return



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java


For other plugin-loading code, we use JavaUtils.getClassLoader().  Should 
probably do the same here?




trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java


Don't use printStackTrace; log the exception instead.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java


See comments on StatsAggregator regarding Javadoc.  Also, 
s/statics/statistics/



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java


I don't think this warrants four exclamation marks.



trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java


Is it worth using a prepared statement here?  

Also, depending on the transaction isolation level, concurrent update 
attempts could result

[jira] Updated: (HIVE-307) "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name

2010-09-17 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-307:
---

  Status: Open  (was: Patch Available)
Assignee: Kirk True

Cancelling the patch because of a missing test case. Krik, would be great if 
you can resubmit with the test case. Otherwise the code looks fine to me.

Ashish

> "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the 
> same name
> --
>
> Key: HIVE-307
> URL: https://issues.apache.org/jira/browse/HIVE-307
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Zheng Shao
>Assignee: Kirk True
>Priority: Critical
> Attachments: HIVE-307.patch
>
>
> Failed with exception checkPaths: 
> /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
> exists
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-307) "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name

2010-09-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910799#action_12910799
 ] 

Ashish Thusoo commented on HIVE-307:


Hi Kirk,

Thanks for the contribution. Can you add a simple test case with your patch?

Ashish

> "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the 
> same name
> --
>
> Key: HIVE-307
> URL: https://issues.apache.org/jira/browse/HIVE-307
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Zheng Shao
>Priority: Critical
> Attachments: HIVE-307.patch
>
>
> Failed with exception checkPaths: 
> /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
> exists
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910795#action_12910795
 ] 

Namit Jain commented on HIVE-558:
-

Thanks Paul, I will commit it if the tests pass

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910786#action_12910786
 ] 

Joydeep Sen Sarma commented on HIVE-1651:
-

if a hadoop task is being failed - how is it that any side effect files created 
by hive code running in that task are getting promoted to the final output?

i think the forwarding is a red-herring. we should not commit output files from 
a failed task.

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1653) Ability to enforce correct stats

2010-09-17 Thread Namit Jain (JIRA)
Ability to enforce correct stats


 Key: HIVE-1653
 URL: https://issues.apache.org/jira/browse/HIVE-1653
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.

If one of the mappers/reducers cannot publish stats, it may lead to wrong 
aggregated stats.
There should be a way to avoid this - at the least, a configuration variable 
which fails the 
task if stats cannot be published

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1652) Delete temporary stats data after some time

2010-09-17 Thread Namit Jain (JIRA)
Delete temporary stats data after some time
---

 Key: HIVE-1652
 URL: https://issues.apache.org/jira/browse/HIVE-1652
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.
If the client dies after some stats have been published, there is no way to 
clean that data.

A simple work-around might be to add current timestamp in the data - and a 
background process
to clean up old stats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910775#action_12910775
 ] 

Paul Yang commented on HIVE-558:


Looks good +1

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1534:
-

Status: Open  (was: Patch Available)

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910763#action_12910763
 ] 

Namit Jain commented on HIVE-1534:
--

You can cleanup the patch by not special-casing for partitioned columns. 
Otherwise, the patch looks good

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910758#action_12910758
 ] 

Ning Zhang commented on HIVE-1378:
--

Taking a look now. 

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Meeting notes from the September Hive Contributors Meeting

2010-09-17 Thread John Sichi
Meeting date:  Sept 13, 2010

Location:  Cloudera Palo Alto office

Attendees:  http://www.meetup.com/Hive-Contributors-Group/calendar/14689507/

Carl Steinbach gave a status update on the 0.6 release.  Since plans for 
documentation migration have been deferred to the next release, the only 
remaining issues are completion of the CREATE DATABASE feature (HIVE-675), 
metastore VARCHAR precision widening (HIVE-1364), and metastore upgrade scripts 
(HIVE-1427).  HIVE-675 has already been committed to trunk and the backport for 
0.6 is underway.  Carl is still working on companion feature HIVE-1517, but 
unless it is done by Sept 17, we'll drop it from the 0.6 release.  Once a build 
containing the remaining features passes all unit tests, we'll post a release 
candidate and vote on it (no additional acceptance testing is planned for this 
release).

Next, HIVE-1546 (making semantic analysis pluggable) was discussed.  The Howl 
team gave an overview of their use case for reusing Hive DDL processing within 
Howl's CLI, together with a description of the roadmap for Howl/Hive project 
relationship.  Carl raised concerns about potential dependency creep preventing 
future Hive refactoring, and Namit Jain proposed reworking the approach to 
restrict it to pre/post-analysis hooks (limiting the dependencies exposed) 
rather than full-blown analyzer pluggability+subclassing.  It was agreed that 
the hooks approach was the best course for balancing all of the concerns and 
allowing us to achieve the desired project collaboration benefits.  The Howl 
team also committed to getting their Howl checkins going into a public 
repository as soon as possible, together with setting up continuous integration 
to track the health of the combination of Hive+Howl trunks.

Next, HIVE-1609 (partition filtering metastore API) was briefly discussed, and 
it was agreed that the Howl team would move the predicate parser from JavaCC to 
ANTLR and resubmit the patch.

Finally, HIVE-1476 (metastore creating files as service user) was discussed.  
It was agreed that the approach in the proposed patch (performing HDFS 
operations on the metastore client side) was a stopgap that we don't really 
want to include in Hive.  Instead, the correct long-term solution being 
developed by Todd Lipcon is to upgrade the Thrift version used by Hive to a 
recent one containing his SASL support, and then add impersonation support to 
the metastore server.  Since the Howl team's schedule does not allow them to 
wait for that work to complete and get tested, they will keep the HIVE-1476 
patch privately applied within their own branch of Hive; it will not be 
committed on Hive trunk.  Once they are able to move over to the long-term 
solution, the stopgap can be discarded.  (In the meantime, we need to work 
together to minimize the merge-to-branch impact as the metastore code continues 
to change on trunk.)

The October meetup will be at Facebook HQ in Palo Alto.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910747#action_12910747
 ] 

Namit Jain commented on HIVE-1651:
--

+1

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Status: Patch Available  (was: Open)

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Attachment: HIVE-1651.patch

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)
ScriptOperator should not forward any output to downstream operators if an 
exception is happened


 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
script and then forward the output from stdout to downstream operators. In case 
of any exceptions to the script (e.g., got killed), the ScriptOperator got an 
exception and throw it to upstream operators until MapOperator got it and call 
close(abort). Before the ScriptOperator.close() is called the script output 
stream can still forward output to downstream operators. We should terminate it 
immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1650:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks Namit!

> TestContribNegativeCliDriver fails
> --
>
> Key: HIVE-1650
> URL: https://issues.apache.org/jira/browse/HIVE-1650
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1650.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1607) Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675

2010-09-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910732#action_12910732
 ] 

John Sichi commented on HIVE-1607:
--

I think we can re-close this one now since it was squashed into the backport 
for HIVE-675, right?

> Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
> 
>
> Key: HIVE-1607
> URL: https://issues.apache.org/jira/browse/HIVE-1607
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1607.1.patch.txt, HIVE-1607.2.patch.txt
>
>
> Several methods were removed from the IMetaStoreClient interface as part of 
> HIVE-675:
> {code}
>   /**
>* Drop the table.
>*
>* @param tableName
>*  The table to drop
>* @param deleteData
>*  Should we delete the underlying data
>* @throws MetaException
>*   Could not drop table properly.
>* @throws UnknownTableException
>*   The table wasn't found.
>* @throws TException
>*   A thrift communication error occurred
>* @throws NoSuchObjectException
>*   The table wasn't found.
>*/
>   public void dropTable(String tableName, boolean deleteData)
>   throws MetaException, UnknownTableException, TException,
>   NoSuchObjectException;
>   /**
>* Get a table object.
>*
>* @param tableName
>*  Name of the table to fetch.
>* @return An object representing the table.
>* @throws MetaException
>*   Could not fetch the table
>* @throws TException
>*   A thrift communication error occurred
>* @throws NoSuchObjectException
>*   In case the table wasn't found.
>*/
>   public Table getTable(String tableName) throws MetaException, TException,
>   NoSuchObjectException;
>   public boolean tableExists(String databaseName, String tableName) throws 
> MetaException,
>   TException, UnknownDBException;
> {code}
> These methods should be reinstated with a deprecation warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-17 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1615:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to branch and trunk.  Thanks Edward!


> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Blocker
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910723#action_12910723
 ] 

John Sichi commented on HIVE-1534:
--

+1 to what Namit said about partitioned columns...although I have seen this 
mistake come up a lot, it's better to have consistent semantics and just 
educate users.


> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910716#action_12910716
 ] 

HBase Review Board commented on HIVE-1378:
--

Message from: "Steven Wong" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/828/
---

(Updated 2010-09-17 11:00:43.924384)


Review request for Hive Developers.


Summary
---

HIVE-1606 is also fixed as a side effect.


This addresses bug HIVE-1378.
http://issues.apache.org/jira/browse/HIVE-1378


Diffs (updated)
-

  trunk/build.xml 997983 
  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 997983 
  trunk/data/files/datatypes.txt PRE-CREATION 
  trunk/data/scripts/input20_script 997983 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java 997983 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 997983 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 
997983 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveResultSetMetaData.java 
997983 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcColumn.java 997983 
  trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 997983 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 997983 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
997983 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 997983 
  trunk/ql/src/test/results/clientpositive/binary_output_format.q.out 997983 
  trunk/ql/src/test/results/compiler/plan/input20.q.xml 997983 
  trunk/ql/src/test/results/compiler/plan/input4.q.xml 997983 
  trunk/ql/src/test/results/compiler/plan/input5.q.xml 997983 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
997983 

Diff: http://review.cloudera.org/r/828/diff


Testing
---


Thanks,

Steven




> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Steven Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-1378:
--

Attachment: HIVE-1378.1.patch

Regenerated patch based on r997983.


> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910669#action_12910669
 ] 

Namit Jain commented on HIVE-1534:
--

bq. I think it makes sense to push the filters on partitioned columns and not 
output all the table for outer join. Patch pushes filters on partitioned 
columns, even for outer joins. Thoughts?

I dont think it is a good idea to special case partitioned columns - can you 
treat them like any other column



> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-268) "Insert Overwrite Directory" to accept configurable table row format

2010-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910656#action_12910656
 ] 

Edward Capriolo commented on HIVE-268:
--

Still not exactly what you want but with CTAS you can essentially get a folder 
in /user/hive/warehouse/ with the format you want.

> "Insert Overwrite Directory" to accept configurable table row format
> 
>
> Key: HIVE-268
> URL: https://issues.apache.org/jira/browse/HIVE-268
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
>
> There is no way for the users to control the file format when they are 
> outputting the result into a directory.
> We should allow:
> {code}
> INSERT OVERWRITE DIRECTORY "/user/zshao/result"
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '9'
> SELECT tablea.* from tablea;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1226) support filter pushdown against non-native tables

2010-09-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910649#action_12910649
 ] 

He Yongqiang commented on HIVE-1226:


+1, will commit after tests pass.

> support filter pushdown against non-native tables
> -
>
> Key: HIVE-1226
> URL: https://issues.apache.org/jira/browse/HIVE-1226
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler, Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1226.1.patch, HIVE-1226.2.patch, HIVE-1226.3.patch, 
> HIVE-1226.4.patch
>
>
> For example, HBase's scan object can take filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910648#action_12910648
 ] 

He Yongqiang commented on HIVE-1650:


+1, running tests.

> TestContribNegativeCliDriver fails
> --
>
> Key: HIVE-1650
> URL: https://issues.apache.org/jira/browse/HIVE-1650
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1650.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558.3.patch

Changelog:

MetaDataFormatUtils: fixed a bug as Location is null for views & added 
copyright information
DDLSemanticAnalyzer: Use DescTableDesc.getSchema() as the method is static now.
QTestUtil: Additional tags added
Test case outputs updated (incl outputs for 0.17)

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-268) "Insert Overwrite Directory" to accept configurable table row format

2010-09-17 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910530#action_12910530
 ] 

Arun commented on HIVE-268:
---

Hi Team,

 Have you find any solution for this issue or is there any other way to achieve 
this above ?

Thanks,
Jak

> "Insert Overwrite Directory" to accept configurable table row format
> 
>
> Key: HIVE-268
> URL: https://issues.apache.org/jira/browse/HIVE-268
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
>
> There is no way for the users to control the file format when they are 
> outputting the result into a directory.
> We should allow:
> {code}
> INSERT OVERWRITE DIRECTORY "/user/zshao/result"
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '9'
> SELECT tablea.* from tablea;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1534:
--

Status: Patch Available  (was: Open)

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1534:
--

Attachment: patch-1534-2.txt

bq. What about semi joins ? 
I did not find anything wrong with semijoin queries and filters. They can be 
pushed as they are now. I don't think any change is required for semi joins. 
What do you think?

Uploading patch with following changes from earlier one:
* I think it makes sense to push the filters on partitioned columns and not 
output all the table for outer join. Patch pushes filters on partitioned 
columns, even for outer joins. Thoughts?
* Removed duplicate code in genExprNode() in SemanticAnalyzer.
* Fixed some minor bugs in SemanticAnalyzer and CommonJoinOperator, found in 
the test failures.
* Updated the test results for clientpositive queries in join20.q.out, 
join21.q.out and join40.q.out, which involve filters on outer joins. Also, 
updated test results for TestParse join queries.

All the tests passed with the uploaded patch.


> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.