from:"Namit Jain \(JIRA\)"

[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-06 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1691:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

 ANALYZE TABLE command should check columns in partition spec
 

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1691.patch


 ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
 are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

2010-10-06 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918603#action_12918603
 ] 

Namit Jain commented on HIVE-1695:
--

This would be a very useful optimization

 MapJoin followed by ReduceSink should be done as single MapReduce Job
 -

 Key: HIVE-1695
 URL: https://issues.apache.org/jira/browse/HIVE-1695
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu

 Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
 only job followed by a Map-Reduce job. It can be combined into single 
 MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1364) Increase the maximum length of various metastore fields, and remove TYPE_NAME from COLUMNS primary key

2010-10-06 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1364:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to both 0.6 and trunk. Thanks Carl

 Increase the maximum length of various metastore fields, and remove TYPE_NAME 
 from COLUMNS primary key
 --

 Key: HIVE-1364
 URL: https://issues.apache.org/jira/browse/HIVE-1364
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1364.2.patch.txt, 
 HIVE-1364.3.backport-060.patch.txt, HIVE-1364.3.patch.txt, 
 HIVE-1364.4.backport-060.patch.txt, HIVE-1364.4.patch.txt, HIVE-1364.patch


 The value component of a SERDEPROPERTIES key/value pair is currently limited
 to a maximum length of 767 characters. I believe that the motivation for 
 limiting the length to 
 767 characters is that this value is the maximum allowed length of an index in
 a MySQL database running on the InnoDB engine: 
 http://bugs.mysql.com/bug.php?id=13315
 * The Metastore OR mapping currently limits many fields (including 
 SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
 the fact that these fields are not indexed.
 * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
 * We can expect many users to hit the 767 character limit on 
 SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
 serdeproperty to map a table that has many columns.
 I propose increasing the maximum allowed length of 
 SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1693) Make the compile target depend on thrift.home

2010-10-06 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918679#action_12918679
 ] 

Namit Jain commented on HIVE-1693:
--

+1

 Make the compile target depend on thrift.home
 -

 Key: HIVE-1693
 URL: https://issues.apache.org/jira/browse/HIVE-1693
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.5.0
Reporter: Eli Collins
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-1693-1.patch


 Per http://wiki.apache.org/hadoop/Hive/HiveODBC the ant compile targets 
 require thrift.home be set. Rather than fail to compile fail with a message 
 indicating it should be set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1693) Make the compile target depend on thrift.home

2010-10-06 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1693.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to both 0.6 and trunk. Thanks Eli

 Make the compile target depend on thrift.home
 -

 Key: HIVE-1693
 URL: https://issues.apache.org/jira/browse/HIVE-1693
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.5.0
Reporter: Eli Collins
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-1693-1.patch


 Per http://wiki.apache.org/hadoop/Hive/HiveODBC the ant compile targets 
 require thrift.home be set. Rather than fail to compile fail with a message 
 indicating it should be set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)

2010-10-06 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1427.
--

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]

Committed in both 0.6 and trunk. Thanks Carl

 Provide metastore schema migration scripts (0.5 - 0.6)
 ---

 Key: HIVE-1427
 URL: https://issues.apache.org/jira/browse/HIVE-1427
 Project: Hadoop Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1427.1.patch.txt


 At a minimum this ticket covers packaging up example MySQL migration scripts 
 (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
 do with them in the release notes.
 This is also probably a good point at which to decide and clearly state which 
 Metastore DBs we officially support in production, e.g. do we need to provide 
 migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1697) Migration scripts should increase size of PARAM_VALUE in PARTITION_PARAMS

2010-10-06 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1697:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
   0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed in trunk and 0.6 - Thanks Paul

 Migration scripts should increase size of PARAM_VALUE in PARTITION_PARAMS
 -

 Key: HIVE-1697
 URL: https://issues.apache.org/jira/browse/HIVE-1697
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1697.1.patch


 The migration scripts should increase the size of column PARAM_VALUE in the 
 table PARTITION_PARAMS to 4000 chars to follow the description in package.jdo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-10-06 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918740#action_12918740
 ] 

Namit Jain commented on HIVE-537:
-

Otherwise it looks good to me

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537-2.txt, 
 patch-537-3.txt, patch-537-4.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-05 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1691:
-

Summary: ANALYZE TABLE command should check columns in partition spec  
(was: ANALYZE TABLE command should check columns in partitioin spec)

 ANALYZE TABLE command should check columns in partition spec
 

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1691.patch


 ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
 are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-05 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918359#action_12918359
 ] 

Namit Jain commented on HIVE-1691:
--

I will take a look

 ANALYZE TABLE command should check columns in partition spec
 

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1691.patch


 ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
 are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-10-04 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1647:
-

Status: Open  (was: Patch Available)

 Incorrect initialization of thread local variable inside IOContext ( 
 implementation is not threadsafe ) 
 

 Key: HIVE-1647
 URL: https://issues.apache.org/jira/browse/HIVE-1647
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0, 0.7.0
Reporter: Raman Grover
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: HIVE-1647.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Bug in org.apache.hadoop.hive.ql.io.IOContext
 in relation to initialization of thread local variable.
  
 public class IOContext {
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){ };
  
   static {
 if (threadLocal.get() == null) {
   threadLocal.set(new IOContext());
 }
   }
  
 In a multi-threaded environment, the thread that gets to load the class first 
 for the JVM (assuming threads share the classloader),
 gets to initialize itself correctly by executing the code in the static 
 block. Once the class is loaded, 
 any subsequent threads would  have their respective threadlocal variable as 
 null.  Since IOContext
 is set during initialization of HiveRecordReader, In a scenario where 
 multiple threads get to acquire
  an instance of HiveRecordReader, it would result in a NPE for all but the 
 first thread that gets to load the class in the VM.
  
 Is the above scenario of multiple threads initializing HiveRecordReader a 
 typical one ?  or we could just provide the following fix...
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){
 protected synchronized IOContext initialValue() {
   return new IOContext();
 }  
   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917677#action_12917677
 ] 

Namit Jain commented on HIVE-1546:
--

I will take a look in more detail, but overall it looks good. I had the 
following comments:

1. Instead of TestSemanticAnalyzerHookLoading.java, add tests in 
test/queries/clientpositive and test/queries/clientnegative
2. Do you want to set the value of hive.semantic.analyzer.hook to a dummy value 
in data/conf/hive-site.xml for the unit tests ?
Can something meaningful be printed here, which can be used for comparing ?


 Ability to plug custom Semantic Analyzers for Hive Grammar
 --

 Key: HIVE-1546
 URL: https://issues.apache.org/jira/browse/HIVE-1546
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
 hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt


 It will be useful if Semantic Analysis phase is made pluggable such that 
 other projects can do custom analysis of hive queries before doing metastore 
 operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917684#action_12917684
 ] 

Namit Jain commented on HIVE-1678:
--

Nice catch - Thanks

+1

will commit if the tests pass

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1678:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Amareshwari

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Namit Jain (JIRA)

intermittent failures in create_escape.q


 Key: HIVE-1684
 URL: https://issues.apache.org/jira/browse/HIVE-1684
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang


[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
[junit] 48d47
[junit]serialization.format\t  
[junit] 49a49
[junit]serialization.format\t  


Sometimes, I see the above failure. 

This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1685) scriptfile1.1 in minimr faling intermittently

2010-10-01 Thread Namit Jain (JIRA)

scriptfile1.1 in minimr faling intermittently
-

 Key: HIVE-1685
 URL: https://issues.apache.org/jira/browse/HIVE-1685
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


 [junit] Begin query: scriptfile1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/scriptfile1.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/scriptfile1.q.out
[junit] 1c1
[junit]  PREHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit]  PREHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 3c3
[junit]  POSTHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit]  POSTHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 5c5
[junit]  POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] ---
[junit]  POSTHOOK: Output: defa...@dest1
[junit] 12c12
[junit]  INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] ---
[junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
[junit]  INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] See build/ql/tmp/hive.log, or try ant test ... 
-Dtest.silent=false to get more logs.
[junit] 15c15
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit]  PREHOOK: Output: defa...@scriptfile1_dest1
[junit] at 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:522)
[junit] ---
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit]  PREHOOK: Output: defa...@dest1
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] 22c22
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit]  INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] ---
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)
[junit]  INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] 25,28c25,28
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit]  POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit]  POSTHOOK: Lineage: scriptfile1_dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit]  POSTHOOK: Lineage: scriptfile1_dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit]  PREHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_dest1
[junit] at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] ---
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit]  POSTHOOK: Output: defa...@dest1
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit]  POSTHOOK: Lineage: dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit]  POSTHOOK: Lineage: dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit]  PREHOOK: query: SELECT dest1.* FROM dest1
[junit] 30,32c30,32
[junit]  PREHOOK: Input: defa...@scriptfile1_dest1
[junit]  PREHOOK: Output: 
hdfs://localhost.localdomain:59220/data/users/njain/hive_commit1/hive_commit1/build/ql/scratchdir/hive_2010-09-30_01-24-37_987_7722845044472176538/-mr-1
[junit]  POSTHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_dest1
[junit] ---

[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916951#action_12916951
 ] 

Namit Jain commented on HIVE-1673:
--

scriptfile1.q and smp_mapjoin_8.q in TestMiniMrCliDriver are also unrelated - 
filed independent jiras for them also.


 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1683) Column aliases cannot be used in a group by clause

2010-10-01 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916953#action_12916953
 ] 

Namit Jain commented on HIVE-1683:
--

A workaround is to use the original expression:

select col1, count(col2) from test group by col1;

 Column aliases cannot be used in a group by clause
 --

 Key: HIVE-1683
 URL: https://issues.apache.org/jira/browse/HIVE-1683
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shrikrishna Lawande

 Column aliases cannot be used in a group by clause
 Following query would fail :
 select col1 as t, count(col2) from test group by t;
 FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
 Reference t

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1687) smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails

2010-10-01 Thread Namit Jain (JIRA)

smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails
--

 Key: HIVE-1687
 URL: https://issues.apache.org/jira/browse/HIVE-1687
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


The test never seems to succeed for me, although it is OK for many other people

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1673:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Yongqiang

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-30 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1670:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Ning

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-30 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916470#action_12916470
 ] 

Namit Jain commented on HIVE-1673:
--

The following tests failed:

create_escape.q


and 


scriptfile1.q
smp_mapjoin_8.q

in TestMiniMrCliDriver

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-30 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916734#action_12916734
 ] 

Namit Jain commented on HIVE-1667:
--

There may be a superuser, say root, who is the owner of the parent dir (which 
will be warehouse dir).

If I create a table T, the group should be my group, not root. - The current 
BSD semantics force the group to be root

 Store the group of the owner of the table in metastore
 --

 Key: HIVE-1667
 URL: https://issues.apache.org/jira/browse/HIVE-1667
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
 Attachments: hive-1667.patch


 Currently, the group of the owner of the table is not stored in the metastore.
 Secondly, if you create a table, the table's owner group is set to the group 
 for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-29 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916205#action_12916205
 ] 

Namit Jain commented on HIVE-1670:
--

+1

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-29 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916207#action_12916207
 ] 

Namit Jain commented on HIVE-1658:
--

Thiruvel, any updates on this - we need it urgently in order to deploy HIVE-558


 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan

 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916208#action_12916208
 ] 

Namit Jain commented on HIVE-1638:
--

great results - I will review the patch

 convert commonly used udfs to generic udfs
 --

 Key: HIVE-1638
 URL: https://issues.apache.org/jira/browse/HIVE-1638
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-1638.1.patch


 Copying a mail from Joy:
 i did a little bit of profiling of a simple hive group by query today. i was 
 surprised to see that one of the most expensive functions were in converting 
 the equals udf (i had some simple string filters) to generic udfs. 
 (primitiveobjectinspectorconverter.textconverter)
 am i correct in thinking that the fix is to simply port some of the most 
 popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-28 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915856#action_12915856
 ] 

Namit Jain commented on HIVE-1642:
--

yes

 Convert join queries to map-join based on size of table/row
 ---

 Key: HIVE-1642
 URL: https://issues.apache.org/jira/browse/HIVE-1642
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0


 Based on the number of rows and size of each table, Hive should automatically 
 be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915440#action_12915440
 ] 

Namit Jain commented on HIVE-1671:
--

Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?

 multithreading on Context.pathToCS
 --

 Key: HIVE-1671
 URL: https://issues.apache.org/jira/browse/HIVE-1671
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1671-1.patch


 we having 2 threads running at 100%
 With a stacktrace like this:
 Thread-16725 prio=10 tid=0x7ff410662000 nid=0x497d runnable 
 [0x442eb000]
java.lang.Thread.State: RUNNABLE
 at java.util.HashMap.get(HashMap.java:303)
 at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1157) UDFs can't be loaded via add jar when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915469#action_12915469
 ] 

Namit Jain commented on HIVE-1157:
--

This is good to have - I will take a look

 UDFs can't be loaded via add jar when jar is on HDFS
 --

 Key: HIVE-1157
 URL: https://issues.apache.org/jira/browse/HIVE-1157
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Philip Zeyliger
Priority: Minor
 Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
 HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
 output.txt


 As discussed on the mailing list, it would be nice if you could use UDFs that 
 are on jars on HDFS.  The proposed implementation would be for add jar to 
 recognize that the target file is on HDFS, copy it locally, and load it into 
 the classpath.
 {quote}
 Hi folks,
 I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
 Can you use a UDF where the jar which contains the function is on HDFS, and 
 not on the local filesystem.  Specifically, the following does not seem to 
 work:
 # This is Hive 0.5, from svn
 $bin/hive  
 Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
 hive add jar hdfs://localhost/FooTest.jar;   

 Added hdfs://localhost/FooTest.jar to class path
 hive create temporary function cube as 'com.cloudera.FooTestUDF';
 
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.FunctionTask
 Does this work for other people?  I could probably fix it by changing add 
 jar to download remote jars locally, when necessary (to load them into the 
 classpath), or update URLClassLoader (or whatever is underneath there) to 
 read directly from HDFS, which seems a bit more fragile.  But I wanted to 
 make sure that my interpretation of what's going on is right before I have at 
 it.
 Thanks,
 -- Philip
 {quote}
 {quote}
 Yes that's correct. I prefer to download the jars in add jar.
 Zheng
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915508#action_12915508
 ] 

Namit Jain commented on HIVE-1671:
--

OK, I can now see the problem.

+1


 multithreading on Context.pathToCS
 --

 Key: HIVE-1671
 URL: https://issues.apache.org/jira/browse/HIVE-1671
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1671-1.patch


 we having 2 threads running at 100%
 With a stacktrace like this:
 Thread-16725 prio=10 tid=0x7ff410662000 nid=0x497d runnable 
 [0x442eb000]
java.lang.Thread.State: RUNNABLE
 at java.util.HashMap.get(HashMap.java:303)
 at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-27 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915516#action_12915516
 ] 

Namit Jain commented on HIVE-1667:
--

Is it true that the first group is always the primary group (like Unix) ?
If org.apache.hadoop.security.UserGroupInformation guarantees that, your 
approach seems correct.

One more thing to check is what happens in case of creation of a partition via 
insert overwrite ?
The temp. folder is moved to the final folder, so you may have to fix there 
also.

 Store the group of the owner of the table in metastore
 --

 Key: HIVE-1667
 URL: https://issues.apache.org/jira/browse/HIVE-1667
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
 Attachments: hive-1667.patch


 Currently, the group of the owner of the table is not stored in the metastore.
 Secondly, if you create a table, the table's owner group is set to the group 
 for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1671.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Bennie

 multithreading on Context.pathToCS
 --

 Key: HIVE-1671
 URL: https://issues.apache.org/jira/browse/HIVE-1671
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1671-1.patch


 we having 2 threads running at 100%
 With a stacktrace like this:
 Thread-16725 prio=10 tid=0x7ff410662000 nid=0x497d runnable 
 [0x442eb000]
java.lang.Thread.State: RUNNABLE
 at java.util.HashMap.get(HashMap.java:303)
 at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
 at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1157) UDFs can't be loaded via add jar when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915619#action_12915619
 ] 

Namit Jain commented on HIVE-1157:
--

The changes looked good, but I got the following error:

[junit] Begin query: alter1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit2/hive_commit2/build/ql/test/logs/clientpositive/alter1.q.out
 
/data/users/njain/hive_commit2/hive_commit2/ql/src/test/results/clientpositive/alter1.q.out
[junit] 778d777
[junit]  Resource ../data/files/TestSerDe.jar already added.


Philip, can you take care of that ?

 UDFs can't be loaded via add jar when jar is on HDFS
 --

 Key: HIVE-1157
 URL: https://issues.apache.org/jira/browse/HIVE-1157
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Philip Zeyliger
Priority: Minor
 Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
 HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
 output.txt


 As discussed on the mailing list, it would be nice if you could use UDFs that 
 are on jars on HDFS.  The proposed implementation would be for add jar to 
 recognize that the target file is on HDFS, copy it locally, and load it into 
 the classpath.
 {quote}
 Hi folks,
 I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
 Can you use a UDF where the jar which contains the function is on HDFS, and 
 not on the local filesystem.  Specifically, the following does not seem to 
 work:
 # This is Hive 0.5, from svn
 $bin/hive  
 Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
 hive add jar hdfs://localhost/FooTest.jar;   

 Added hdfs://localhost/FooTest.jar to class path
 hive create temporary function cube as 'com.cloudera.FooTestUDF';
 
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.FunctionTask
 Does this work for other people?  I could probably fix it by changing add 
 jar to download remote jars locally, when necessary (to load them into the 
 classpath), or update URLClassLoader (or whatever is underneath there) to 
 read directly from HDFS, which seems a bit more fragile.  But I wanted to 
 make sure that my interpretation of what's going on is right before I have at 
 it.
 Thanks,
 -- Philip
 {quote}
 {quote}
 Yes that's correct. I prefer to download the jars in add jar.
 Zheng
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1666) retry metadata operation in case of an failure

2010-09-23 Thread Namit Jain (JIRA)

retry metadata operation in case of an failure
--

 Key: HIVE-1666
 URL: https://issues.apache.org/jira/browse/HIVE-1666
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang


If a user is trying to insert into a partition,

insert overwrite table T partition (p) select ..


it is possible that the directory gets created, but the metadata creation of 
t...@p fails - 
currently, we will just throw an error. The final directory has been created.

It will be useful to at-least retry the metadata operation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-23 Thread Namit Jain (JIRA)

Store the group of the owner of the table in metastore
--

 Key: HIVE-1667
 URL: https://issues.apache.org/jira/browse/HIVE-1667
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain


Currently, the group of the owner of the table is not stored in the metastore.

Secondly, if you create a table, the table's owner group is set to the group 
for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913662#action_12913662
 ] 

Namit Jain commented on HIVE-1658:
--

@Thiruvel, can we keep the new output in the old format.
I mean, we just have to make sure that the output has 3 columns separated by a 
delimiter.

So, if your current output is 'x', you can replace it with:

TABxTAB

An implicit null at the beginning and end.




 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan

 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1661) Default values for parameters

2010-09-22 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913726#action_12913726
 ] 

Namit Jain commented on HIVE-1661:
--

Yongqiang, can you take a look at this ?

 Default values for parameters
 -

 Key: HIVE-1661
 URL: https://issues.apache.org/jira/browse/HIVE-1661
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0

 Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch


 It would be good to have a default value for some hive parameters:
 say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913783#action_12913783
 ] 

Namit Jain commented on HIVE-1361:
--

Ning, the latest patch contains the output of svn stat

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912851#action_12912851
 ] 

Namit Jain commented on HIVE-1609:
--

Once this is in, it would be useful to add a API like get_partitions_ps in HIVE 
- I mean, get all sub-partitions.

For example, if the table is partitioned on (ds, hr): 
something like

show partitions (ds='2010-09-20', hr) should return all sub-partitions.

 Support partition filtering in metastore
 

 Key: HIVE-1609
 URL: https://issues.apache.org/jira/browse/HIVE-1609
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ajay Kidave
Assignee: Ajay Kidave
 Fix For: 0.7.0

 Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch


 The metastore needs to have support for returning a list of partitions based 
 on user specified filter conditions. This will be useful for tools which need 
 to do partition pruning. Howl is one such use case. The way partition pruning 
 is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913001#action_12913001
 ] 

Namit Jain commented on HIVE-1609:
--

I meant, exposing it via the Hive QL directly.
I don't think there is a way to do that currently.

 Support partition filtering in metastore
 

 Key: HIVE-1609
 URL: https://issues.apache.org/jira/browse/HIVE-1609
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ajay Kidave
Assignee: Ajay Kidave
 Fix For: 0.7.0

 Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch


 The metastore needs to have support for returning a list of partitions based 
 on user specified filter conditions. This will be useful for tools which need 
 to do partition pruning. Howl is one such use case. The way partition pruning 
 is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913105#action_12913105
 ] 

Namit Jain commented on HIVE-1534:
--

+1

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534-4.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1661) Default values for parameters

2010-09-21 Thread Namit Jain (JIRA)

Default values for parameters
-

 Key: HIVE-1661
 URL: https://issues.apache.org/jira/browse/HIVE-1661
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0


It would be good to have a default value for some hive parameters:

say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-21 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1534:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Amareshwari

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534-4.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913317#action_12913317
 ] 

Namit Jain commented on HIVE-1655:
--

+1

 Adding consistency check at jobClose() when committing dynamic partitions
 -

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1655.patch


 In case of dynamic partition insert, FileSinkOperator generated a directory 
 for a new partition and the files in the directory is named with '_tmp*'. 
 When a task succeed, the file is renamed to remove the _tmp, which 
 essentially implement the commit semantics. A lot of exceptions could 
 happen (process got killed, machine dies etc.) could left the _tmp files 
 exist in the DP directory. These _tmp files should be deleted (rolled back) 
 at successful jobClose(). After the deletion, we should also delete any empty 
 directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-21 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1655:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

committed. Thanks Ning

 Adding consistency check at jobClose() when committing dynamic partitions
 -

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1655.patch


 In case of dynamic partition insert, FileSinkOperator generated a directory 
 for a new partition and the files in the directory is named with '_tmp*'. 
 When a task succeed, the file is renamed to remove the _tmp, which 
 essentially implement the commit semantics. A lot of exceptions could 
 happen (process got killed, machine dies etc.) could left the _tmp files 
 exist in the DP directory. These _tmp files should be deleted (rolled back) 
 at successful jobClose(). After the deletion, we should also delete any empty 
 directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912585#action_12912585
 ] 

Namit Jain commented on HIVE-558:
-

Running tests again - will commit if it passes

 describe extended table/partition output is cryptic
 ---

 Key: HIVE-558
 URL: https://issues.apache.org/jira/browse/HIVE-558
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Chakka
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-558.3.patch, HIVE-558.4.patch, HIVE-558.patch, 
 HIVE-558.patch, HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt


 describe extended table prints out the Thrift metadata object directly. The 
 information from it is not easy to read or parse. Output should be easily 
 read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912590#action_12912590
 ] 

Namit Jain commented on HIVE-1534:
--

I will take a look again

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-558) describe extended table/partition output is cryptic

2010-09-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-558.
-

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Thiruvel

 describe extended table/partition output is cryptic
 ---

 Key: HIVE-558
 URL: https://issues.apache.org/jira/browse/HIVE-558
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Chakka
Assignee: Thiruvel Thirumoolan
 Fix For: 0.7.0

 Attachments: HIVE-558.3.patch, HIVE-558.4.patch, HIVE-558.patch, 
 HIVE-558.patch, HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt


 describe extended table prints out the Thrift metadata object directly. The 
 information from it is not easy to read or parse. Output should be easily 
 read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912694#action_12912694
 ] 

Namit Jain commented on HIVE-1497:
--

btw, HIVE-558 just got committed.

 support COMMENT clause on CREATE INDEX, and add new commands for 
 SHOW/DESCRIBE indexes
 --

 Key: HIVE-1497
 URL: https://issues.apache.org/jira/browse/HIVE-1497
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Fix For: 0.7.0


 We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into 
 account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912741#action_12912741
 ] 

Namit Jain commented on HIVE-1534:
--

The patch looks good - however, we have a deployment issue.

This is a incompatible change, and will change/break existing queries. I cant 
think of a great way of getting this in.
One option is to cover it via a configurable parameter (it is ON by default). 
For internal deployments (like Facebook),
we can turn it off and find all the bad queries slowly and convert them, and 
only then enable this.


 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912845#action_12912845
 ] 

Namit Jain commented on HIVE-1534:
--

What I meant to say was the following:


People are running queries in the warehouse with the expected wrong semantics - 
if we suddenly fix this, the queries will break.
We need to give some time to everyone to change their queries to use a 
sub-query if they want the filter to be pushed up.

Adding the above config, parameter seems like the only choice - we can try to 
remove this parameter before 0.7 goes out 
(if everyone agrees), but we need it right now for deployment

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912846#action_12912846
 ] 

Namit Jain commented on HIVE-1658:
--

Thiruvel, this is a show-stopper for HIVE-558.
The schema for describe and describe extended cannot be changed.

You can add NULLs at the beginning/end, but the number of columns have to be 
maintained

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan

 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912849#action_12912849
 ] 

Namit Jain commented on HIVE-1620:
--

I will take a look and get back to you

 Patch to write directly to S3 from Hive
 ---

 Key: HIVE-1620
 URL: https://issues.apache.org/jira/browse/HIVE-1620
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1620.patch


 We want to submit a patch to Hive which allows user to write files directly 
 to S3.
 This patch allow user to specify an S3 location as the table output location 
 and hence eliminates the need  of copying data from HDFS to S3.
 Users can run Hive queries directly over the data stored in S3.
 This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910903#action_12910903
 ] 

Namit Jain commented on HIVE-1617:
--

Committed. Thanks Paul

 ScriptOperator's AutoProgressor can lead to an infinite loop
 

 Key: HIVE-1617
 URL: https://issues.apache.org/jira/browse/HIVE-1617
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Fix For: 0.7.0

 Attachments: HIVE-1617.1.patch


 In the default settings, the auto progressor can result in a infinite loop.
 There should be another configurable parameter which stops the auto progress 
 if the script has not made an progress.
 The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1361:
-

Status: Open  (was: Patch Available)

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12911032#action_12911032
 ] 

Namit Jain commented on HIVE-558:
-

TestJdbcDriver is failing - I havent debugged it yet. Can you take a look ?
All other tests are passing

 describe extended table/partition output is cryptic
 ---

 Key: HIVE-558
 URL: https://issues.apache.org/jira/browse/HIVE-558
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Chakka
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
 HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt


 describe extended table prints out the Thrift metadata object directly. The 
 information from it is not easy to read or parse. Output should be easily 
 read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-17 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910470#action_12910470
 ] 

Namit Jain commented on HIVE-1361:
--

Will take a look 

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1616) Add ProtocolBuffersStructObjectInspector

2010-09-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1616:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Johan

 Add ProtocolBuffersStructObjectInspector
 

 Key: HIVE-1616
 URL: https://issues.apache.org/jira/browse/HIVE-1616
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Johan Oskarsson
Assignee: Johan Oskarsson
Priority: Minor
 Fix For: 0.7.0

 Attachments: HIVE-1616.patch


 Much like there is a ThriftStructObjectInspector that ignores the isset 
 booleans there is a need for a ProtocolBuffersStructObjectInspector that 
 ignores has*. This can then be used together with Twitter's elephant-bird.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread Namit Jain (JIRA)

TestContribNegativeCliDriver fails
--

 Key: HIVE-1650
 URL: https://issues.apache.org/jira/browse/HIVE-1650
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1650:
-

Status: Patch Available  (was: Open)

 TestContribNegativeCliDriver fails
 --

 Key: HIVE-1650
 URL: https://issues.apache.org/jira/browse/HIVE-1650
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1650.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910747#action_12910747
 ] 

Namit Jain commented on HIVE-1651:
--

+1

 ScriptOperator should not forward any output to downstream operators if an 
 exception is happened
 

 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1651.patch


 ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
 script and then forward the output from stdout to downstream operators. In 
 case of any exceptions to the script (e.g., got killed), the ScriptOperator 
 got an exception and throw it to upstream operators until MapOperator got it 
 and call close(abort). Before the ScriptOperator.close() is called the script 
 output stream can still forward output to downstream operators. We should 
 terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910763#action_12910763
 ] 

Namit Jain commented on HIVE-1534:
--

You can cleanup the patch by not special-casing for partitioned columns. 
Otherwise, the patch looks good

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1652) Delete temporary stats data after some time

2010-09-17 Thread Namit Jain (JIRA)

Delete temporary stats data after some time
---

 Key: HIVE-1652
 URL: https://issues.apache.org/jira/browse/HIVE-1652
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.
If the client dies after some stats have been published, there is no way to 
clean that data.

A simple work-around might be to add current timestamp in the data - and a 
background process
to clean up old stats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1653) Ability to enforce correct stats

2010-09-17 Thread Namit Jain (JIRA)

Ability to enforce correct stats


 Key: HIVE-1653
 URL: https://issues.apache.org/jira/browse/HIVE-1653
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.

If one of the mappers/reducers cannot publish stats, it may lead to wrong 
aggregated stats.
There should be a way to avoid this - at the least, a configuration variable 
which fails the 
task if stats cannot be published

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910795#action_12910795
 ] 

Namit Jain commented on HIVE-558:
-

Thanks Paul, I will commit it if the tests pass

 describe extended table/partition output is cryptic
 ---

 Key: HIVE-558
 URL: https://issues.apache.org/jira/browse/HIVE-558
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Chakka
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
 HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt


 describe extended table prints out the Thrift metadata object directly. The 
 information from it is not easy to read or parse. Output should be easily 
 read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-16 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1639:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Ning

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.2.patch, HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-09-16 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-675:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed in 0.6. Thanks Carl

 add database/schema support Hive QL
 ---

 Key: HIVE-675
 URL: https://issues.apache.org/jira/browse/HIVE-675
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
 hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
 hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
 HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
 HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
 HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
 HIVE-675.13.patch.txt


 Currently all Hive tables reside in single namespace (default). Hive should 
 support multiple namespaces (databases or schemas) such that users can create 
 tables in their specific namespaces. These name spaces can have different 
 warehouse directories (with a default naming scheme) and possibly different 
 properties.
 There is already some support for this in metastore but Hive query parser 
 should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910386#action_12910386
 ] 

Namit Jain commented on HIVE-1617:
--

Can you add a negative testcase - which times out 

 ScriptOperator's AutoProgressor can lead to an infinite loop
 

 Key: HIVE-1617
 URL: https://issues.apache.org/jira/browse/HIVE-1617
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Fix For: 0.7.0

 Attachments: HIVE-1617.1.patch


 In the default settings, the auto progressor can result in a infinite loop.
 There should be another configurable parameter which stops the auto progress 
 if the script has not made an progress.
 The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1617:
-

Status: Open  (was: Patch Available)

 ScriptOperator's AutoProgressor can lead to an infinite loop
 

 Key: HIVE-1617
 URL: https://issues.apache.org/jira/browse/HIVE-1617
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Fix For: 0.7.0

 Attachments: HIVE-1617.1.patch


 In the default settings, the auto progressor can result in a infinite loop.
 There should be another configurable parameter which stops the auto progress 
 if the script has not made an progress.
 The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910419#action_12910419
 ] 

Namit Jain commented on HIVE-1617:
--

Otherwise it looks good

 ScriptOperator's AutoProgressor can lead to an infinite loop
 

 Key: HIVE-1617
 URL: https://issues.apache.org/jira/browse/HIVE-1617
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Fix For: 0.7.0

 Attachments: HIVE-1617.1.patch


 In the default settings, the auto progressor can result in a infinite loop.
 There should be another configurable parameter which stops the auto progress 
 if the script has not made an progress.
 The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-15 Thread Namit Jain (JIRA)

convert commonly used udfs to generic udfs
--

 Key: HIVE-1638
 URL: https://issues.apache.org/jira/browse/HIVE-1638
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong


Copying a mail from Joy:


i did a little bit of profiling of a simple hive group by query today. i was 
surprised to see that one of the most expensive functions were in converting 
the equals udf (i had some simple string filters) to generic udfs. 
(primitiveobjectinspectorconverter.textconverter)

am i correct in thinking that the fix is to simply port some of the most 
popular udfs (string equality/comparison etc.) to generic udsf?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909903#action_12909903
 ] 

Namit Jain commented on HIVE-1639:
--

Can you add a testcase ?

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1641) add map joined table to distributed cache

2010-09-15 Thread Namit Jain (JIRA)

add map joined table to distributed cache
-

 Key: HIVE-1641
 URL: https://issues.apache.org/jira/browse/HIVE-1641
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


Currently, the mappers directly read the map-joined table from HDFS, which 
makes it difficult to scale.
We end up getting lots of timeouts once the number of mappers are beyond a few 
thousand, due to 
concurrent mappers.

It would be good idea to put the mapped file into distributed cache and read 
from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-15 Thread Namit Jain (JIRA)

Convert join queries to map-join based on size of table/row
---

 Key: HIVE-1642
 URL: https://issues.apache.org/jira/browse/HIVE-1642
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


Based on the number of rows and size of each table, Hive should automatically 
be able to convert a join into map-join.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1641) add map joined table to distributed cache

2010-09-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1641:


Assignee: Liyin Tang

 add map joined table to distributed cache
 -

 Key: HIVE-1641
 URL: https://issues.apache.org/jira/browse/HIVE-1641
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0


 Currently, the mappers directly read the map-joined table from HDFS, which 
 makes it difficult to scale.
 We end up getting lots of timeouts once the number of mappers are beyond a 
 few thousand, due to 
 concurrent mappers.
 It would be good idea to put the mapped file into distributed cache and read 
 from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1642:


Assignee: Liyin Tang

 Convert join queries to map-join based on size of table/row
 ---

 Key: HIVE-1642
 URL: https://issues.apache.org/jira/browse/HIVE-1642
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0


 Based on the number of rows and size of each table, Hive should automatically 
 be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909956#action_12909956
 ] 

Namit Jain commented on HIVE-1639:
--

+1

looks good - will commit if the tests pass

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.2.patch, HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)

ability to specify parent directory for zookeeper lock manager
--

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch

For concurrency support, it would be desirable if all the locks were created 
under a common parent, so that zookeeper can be used
for different purposes.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1645:
-

Attachment: hive.1645.1.patch

 ability to specify parent directory for zookeeper lock manager
 --

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch


 For concurrency support, it would be desirable if all the locks were created 
 under a common parent, so that zookeeper can be used
 for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909982#action_12909982
 ] 

Namit Jain commented on HIVE-1645:
--

Added a new configuration parameter 'hive.zookeeper.namespace' under which all 
the locks will be created if zookeeper is being used as the lock maanger


 ability to specify parent directory for zookeeper lock manager
 --

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch


 For concurrency support, it would be desirable if all the locks were created 
 under a common parent, so that zookeeper can be used
 for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-15 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910020#action_12910020
 ] 

Namit Jain commented on HIVE-675:
-

I have take a look

 add database/schema support Hive QL
 ---

 Key: HIVE-675
 URL: https://issues.apache.org/jira/browse/HIVE-675
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
 hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
 hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
 HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
 HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
 HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
 HIVE-675.13.patch.txt


 Currently all Hive tables reside in single namespace (default). Hive should 
 support multiple namespaces (databases or schemas) such that users can create 
 tables in their specific namespaces. These name spaces can have different 
 warehouse directories (with a default naming scheme) and possibly different 
 properties.
 There is already some support for this in metastore but Hive query parser 
 should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1635) ability to check partition size for dynamic partiions

2010-09-14 Thread Namit Jain (JIRA)

ability to check partition size for dynamic partiions
-

 Key: HIVE-1635
 URL: https://issues.apache.org/jira/browse/HIVE-1635
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


With dynamic partitions, it becomes very easy to create partitions.

We have seen some scenarios, where a lot of partitions/files get created due to 
some corrupt data (1 corrupt row
can end up creating a partition and a lot of files (number of mappers, if merge 
is false)).

This puts a lot of load on the cluster, and is a debugging nightmare.

It would be good to have a configuration parameter, for the minimum number of 
rows for a partition.
If the number of rows is less than the threshold, the partition need not be 
created. The default value
of this parameter can be zero for backward compatibility

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-14 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909388#action_12909388
 ] 

Namit Jain commented on HIVE-1534:
--

Since we are still pushing filters for non-outer joins, the assumption that we 
will always output a row by the filters is true, and therefore 
we dont need a progress.

Cool, I will take a look at the patch again

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-14 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909539#action_12909539
 ] 

Namit Jain commented on HIVE-1534:
--

Did you run all the tests ? Some of the tests should break - minimally a change 
of explain plans.
What about semi joins ?

Why did you add a genExprNode() ? Cant you re-use the one from SemanticAnalyzer 
?

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534-1.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908854#action_12908854
 ] 

Namit Jain commented on HIVE-1620:
--

The approach looks good, but can you move all the checks to compile time 
instead.

I mean, while generating a plan, create a S3FileSinkOperator instead of 
FileSinkOperator, if the
destination under consideration is on S3 FileSystem - there will be no move 
task etc.
The explain work will show the correct plan


The commit for S3FileSystem will be a no-op. That way, FileSinkOperator does 
not change much

 Patch to write directly to S3 from Hive
 ---

 Key: HIVE-1620
 URL: https://issues.apache.org/jira/browse/HIVE-1620
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1620.patch


 We want to submit a patch to Hive which allows user to write files directly 
 to S3.
 This patch allow user to specify an S3 location as the table output location 
 and hence eliminates the need  of copying data from HDFS to S3.
 Users can run Hive queries directly over the data stored in S3.
 This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908858#action_12908858
 ] 

Namit Jain commented on HIVE-1534:
--

The approach looks OK - I will look into the code for more detailed comments.

One general comment was that you also need to account for progress if the join 
filters filter all the rows.
The task tracker may be thought of an un-responsive. Look at the filter 
operator, we send a progress in
that case if there are 'n' consecutive rows filtered out.

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1621) Disable join filters for outer joins.

2010-09-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908859#action_12908859
 ] 

Namit Jain commented on HIVE-1621:
--

See https://issues.apache.org/jira/browse/HIVE-1534 for the offending test case.

 Disable join filters for outer joins.
 -

 Key: HIVE-1621
 URL: https://issues.apache.org/jira/browse/HIVE-1621
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 As suggested at [comment 
 |https://issues.apache.org/jira/browse/HIVE-1534?focusedCommentId=12907001page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12907001],
  SemanticAnalyzer should give out error if join filter is specified for outer 
 joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908874#action_12908874
 ] 

Namit Jain commented on HIVE-1622:
--

Committed the new log for 0.17

 Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
 ---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch


 Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
 merging files generated by mappers. It should be used for files generated at 
 readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908483#action_12908483
 ] 

Namit Jain commented on HIVE-1630:
--

+1

will commit if the tests pass

 bug in NO_DROP
 --

 Key: HIVE-1630
 URL: https://issues.apache.org/jira/browse/HIVE-1630
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0

 Attachments: HIVE-1630.2.patch


 If the table is marked NO_DROP, we should still be able to drop old 
 partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1630.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Siying

 bug in NO_DROP
 --

 Key: HIVE-1630
 URL: https://issues.apache.org/jira/browse/HIVE-1630
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0

 Attachments: HIVE-1630.2.patch


 If the table is marked NO_DROP, we should still be able to drop old 
 partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1630) bug in NO_DROP

2010-09-10 Thread Namit Jain (JIRA)

bug in NO_DROP
--

 Key: HIVE-1630
 URL: https://issues.apache.org/jira/browse/HIVE-1630
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0


If the table is marked NO_DROP, we should still be able to drop old partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907367#action_12907367
 ] 

Namit Jain commented on HIVE-1622:
--

+1

looks good

 Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
 ---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1622.patch


 Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
 merging files generated by mappers. It should be used for files generated at 
 readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1622:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Ning

 Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
 ---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1622.patch


 Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
 merging files generated by mappers. It should be used for files generated at 
 readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-07 Thread Namit Jain (JIRA)

ScriptOperator's AutoProgressor can lead to an infinite loop


 Key: HIVE-1617
 URL: https://issues.apache.org/jira/browse/HIVE-1617
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


In the default settings, the auto progressor can result in a infinite loop.

There should be another configurable parameter which stops the auto progress if 
the script has not made an progress.
The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-07 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907001#action_12907001
 ] 

Namit Jain commented on HIVE-1534:
--

Definitely a bug, but not related to 
https://issues.apache.org/jira/browse/HIVE-1538.

For outer joins, the filters should not be pushed above the join.


For the query,

SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key  
100); 


the row: 100 100

is being pruned even before it reaches the join.

As you suggested above,  the correct solution is to have the filter as part of 
the join, which we dont support currently.

For now, I would suggest not supporting filters in the join condition for outer 
joins, since we are returning wrong results,
and the correct fix will involve a big change

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1616) Add ProtocolBuffersStructObjectInspector

2010-09-07 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907013#action_12907013
 ] 

Namit Jain commented on HIVE-1616:
--

Can you add a testcase for this ? The changes look good

 Add ProtocolBuffersStructObjectInspector
 

 Key: HIVE-1616
 URL: https://issues.apache.org/jira/browse/HIVE-1616
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Johan Oskarsson
Assignee: Johan Oskarsson
Priority: Minor
 Attachments: HIVE-1616.patch


 Much like there is a ThriftStructObjectInspector that ignores the isset 
 booleans there is a need for a ProtocolBuffersStructObjectInspector that 
 ignores has*. This can then be used together with Twitter's elephant-bird.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1619) pipes not correctly being handled in transform

2010-09-07 Thread Namit Jain (JIRA)

pipes not correctly being handled in transform
--

 Key: HIVE-1619
 URL: https://issues.apache.org/jira/browse/HIVE-1619
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


Akhil had a testcase where he had a query like:


select tranform(*) using 'ex1 | ex2' from T;

There was some problem with the pipe.

The work-around is to move 'ex1 | ex2' in a new executable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-07 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1620:


Assignee: Vaibhav Aggarwal

 Patch to write directly to S3 from Hive
 ---

 Key: HIVE-1620
 URL: https://issues.apache.org/jira/browse/HIVE-1620
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1620.patch


 We want to submit a patch to Hive which allows user to write files directly 
 to S3.
 This patch allow user to specify an S3 location as the table output location 
 and hence eliminates the need  of copying data from HDFS to S3.
 Users can run Hive queries directly over the data stored in S3.
 This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-07 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907093#action_12907093
 ] 

Namit Jain commented on HIVE-1620:
--

How will this handle failures and speculative execution ?
If you are directly writing to a file, cant it lead to duplicate data ?

 Patch to write directly to S3 from Hive
 ---

 Key: HIVE-1620
 URL: https://issues.apache.org/jira/browse/HIVE-1620
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1620.patch


 We want to submit a patch to Hive which allows user to write files directly 
 to S3.
 This patch allow user to specify an S3 location as the table output location 
 and hence eliminates the need  of copying data from HDFS to S3.
 Users can run Hive queries directly over the data stored in S3.
 This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-09-06 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1290#action_1290
 ] 

Namit Jain commented on HIVE-1538:
--

That is right - this has nothing to do with map join.
Whenever, a predicate is pushed down, it is also retained, thereby having 2 
identical filters.

Is this creating a performance problem ? It can definitely be optimized.
I totally agree with your proposed solution.

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2020 matches

Mail list logo