from:"Ashish Thusoo"


 [ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1373:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.6.0
   Resolution: Fixed

Committed. Thanks Vinithra!!


 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
 Fix For: 0.6.0

 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.initialize

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column


[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877233#action_12877233
 ] 

Ashish Thusoo commented on HIVE-1397:
-

+1.

This would be a cool contribution.


 histogram() UDAF for a numerical column
 ---

 Key: HIVE-1397
 URL: https://issues.apache.org/jira/browse/HIVE-1397
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Mayank Lahiri
 Fix For: 0.6.0


 A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
 short, double, long, etc.) column. The result is returned as a map of (x,y) 
 histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
 The algorithm is currently adapted from A streaming parallel decision tree 
 algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
 proportional to the number of histogram bins specified. It has no 
 approximation guarantees, but seems to work well when there is a lot of data 
 and a large number (e.g. 50-100) of histogram bins specified.
 A typical call might be:
 SELECT histogram(val, 10) FROM some_table;
 where the result would be a histogram with 10 bins, returned as a Hive map 
 object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877232#action_12877232
 ] 

Ashish Thusoo commented on HIVE-1139:
-

Arvind, I thought the whole point of this JIRA was to make HashMapWrapper to 
support java.util.Map, no? If that would be a separate JIRA, what would this 
one be for? Sorry for being a bit dense here but if you could clarify that 
would be great.

Thanks,
Ashish


 GroupByOperator sometimes throws OutOfMemory error when there are too many 
 distinct keys
 

 Key: HIVE-1139
 URL: https://issues.apache.org/jira/browse/HIVE-1139
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Arvind Prabhakar

 When a partial aggregation performed on a mapper, a HashMap is created to 
 keep all distinct keys in main memory. This could leads to OOM exception when 
 there are too many distinct keys for a particular mapper. A workaround is to 
 set the map split size smaller so that each mapper takes less number of rows. 
 A better solution is to use the persistent HashMapWrapper (currently used in 
 CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1398) Support union all without an outer select *

Support union all without an outer select *
---

 Key: HIVE-1398
 URL: https://issues.apache.org/jira/browse/HIVE-1398
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


In hive for union alls the query has to be wrapped in an sub query as shown 
below:

select * from 
(select c1 from t1
  union all
  select c2 from t2);

This JIRA proposes to fix that to support

select c1 from t1
union all
select c2 from t2;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877236#action_12877236
 ] 

Ashish Thusoo commented on HIVE-417:


A couple of comments on this:

A complication that happens by doing a rewrite just after parse is that you 
loose the ability to report back errors that correspond to the original query. 
Also the 
metadata that you need to do the rewrite is only available after phase 1 of 
semantic analysis. So in my opinion the rewrite should be done after semantic 
analysis but before plan generation. Is that what you had in mind...

so something like...

[Query parser]
[Query semantic analysis]
[Query optimization]
...


 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing.3.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath


[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872962#action_12872962
 ] 

Ashish Thusoo commented on HIVE-1373:
-

1 copy is anyway done from lib to dist/lib for these jars. If we go directly to 
ivy we would copy things from the ivy cache to dist/lib. So the number of 
copies in the build process
would remain the same, no? There is of course the first time overhead of 
downloading these jars from their repos to the ivy cache.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698

[jira] Assigned: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced

[
https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashish Thusoo reassigned HIVE-1368:
---

Assignee: Sunil Kumar

Sunil, I have added you as a contributor so you can assign JIRAs to yourself.

Hive JDBC Integration with SQuirrel SQL Client support Enhanced
---

Key: HIVE-1368
URL: https://issues.apache.org/jira/browse/HIVE-1368
Project: Hadoop Hive
Issue Type: Improvement
Components: Clients
Affects Versions: 0.5.0
Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Fix For: 0.5.0

Attachments: Hive JDBC Integration with SQuirrel SQL Client support
Enhanced.doc, SQLClient_support.patch

Hive JDBC Integration with SQuirrel SQL Client support Enhanced:-
Hive JDBC Client enhanced to browse hive default schema tables through
Squirrel SQL Client.
This enhancement help to browse the hive table's structure i.e. table's
column and their data type in the Squirrel SQL client interface and SQL query
can be also performed on the tables through Squirrel SQL client.
To enable this following Hive JDBC Java files are modified and added:-
1.Methods of org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.java are
updated.
2.Hive org.apache.hadoop.hive.jdbc.ResultSet.java updated and extended
(org.apache.hadoop.hive.jdbc.ExtendedHiveResultSet.java) to support
additional JDBC metadata
3.Methods of org.apache.hadoop.hive.jdbc. HiveResultSetMetaData are
updated.
4.Methods of org.apache.hadoop.hive.jdbc. HiveConnection are updated.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced

[
https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872965#action_12872965
]

Ashish Thusoo commented on HIVE-1368:
-

In my opinion best would be to load this patch to HIVE-1126 and name it for
0.5.0 in case others want to use it for 0.5.0 and mark this JIRA as a duplicate
of that one.

Hive JDBC Integration with SQuirrel SQL Client support Enhanced
---

Attachments: Hive JDBC Integration with SQuirrel SQL Client support
Enhanced.doc, SQLClient_support.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement


 [ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1346:
---

Assignee: Sunil Kumar

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement


[ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872968#action_12872968
 ] 

Ashish Thusoo commented on HIVE-1346:
-

Hi Sunil,

Have you created this patch on 0.5.0 branch or trunk? Are you  proposing that 
this goes into both 0.5.1 and trunk?

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement


[ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872975#action_12872975
 ] 

Ashish Thusoo commented on HIVE-1346:
-

@Namit,

in what cases would colAlias ever be null. There seems to be code which checks 
for this around line 3314 in the trunk branch. But afaik we should always be 
generating a colAlias (at least the default ones). Just wanted to make sure 
that we are covering all the basis with this fix.

Ashish

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1375) dynamic partitions should not create some of the partitions if the query fails


[ 
https://issues.apache.org/jira/browse/HIVE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872976#action_12872976
 ] 

Ashish Thusoo commented on HIVE-1375:
-

An example would be great to help explain this problem better?

Thanks,
Ashish

 dynamic partitions should not create some of the partitions if the query fails
 --

 Key: HIVE-1375
 URL: https://issues.apache.org/jira/browse/HIVE-1375
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.6.0


 Currently, if a bad row exists, which cannot be part of a partitioning 
 column, it fails - but some of the partitions may already have been created

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1374) Query compile-only option


[ 
https://issues.apache.org/jira/browse/HIVE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872981#action_12872981
 ] 

Ashish Thusoo commented on HIVE-1374:
-

Is doing explain on the query enough? If the proposal to convert queries into 
explains when run with -c option?

Also consider the following example in a query.hql script..


create table foo(bar string);

insert overwrite table foo select c1 from old_foo;

What would happen to the create statement in this compile only option?

Maybe it is better to provide a switch to do parse only checks?

 Query compile-only option
 -

 Key: HIVE-1374
 URL: https://issues.apache.org/jira/browse/HIVE-1374
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang

 A compile-only option might be useful for helping users quickly prototype 
 queries, fix errors, and do test runs. The proposed change would be adding a 
 -c switch that behaves like -e but only compiles the specified query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1372) New algorithm for variance() UDAF


 [ 
https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1372:


Status: Patch Available  (was: Open)

Hi Mayank,

Thanks for the contribution. Please do a submit patch when you put up a patch 
for a JIRA.

Thanks,
Ashish

 New algorithm for variance() UDAF
 -

 Key: HIVE-1372
 URL: https://issues.apache.org/jira/browse/HIVE-1372
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1372.patch


 A new algorithm for the UDAF that computes variance. This is pretty much a 
 drop-in replacement for the current UDAF, and has two benefits: provably 
 numerically stable (reference included in comments), and reduces arithmetic 
 operations by about half.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1372) New algorithm for variance() UDAF


 [ 
https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1372:
---

Assignee: Mayank Lahiri

Also I have added you as a contributor, so you should be able to assign JIRAs 
to yourself.

Thanks,
Ashish

 New algorithm for variance() UDAF
 -

 Key: HIVE-1372
 URL: https://issues.apache.org/jira/browse/HIVE-1372
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Mayank Lahiri
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1372.patch


 A new algorithm for the UDAF that computes variance. This is pretty much a 
 drop-in replacement for the current UDAF, and has two benefits: provably 
 numerically stable (reference included in comments), and reduces arithmetic 
 operations by about half.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1359) Unit test should be shim-aware

[
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872990#action_12872990
]

Ashish Thusoo commented on HIVE-1359:
-

+1 to all the great suggestions in this discussion...

I have one more thing to add. Would it be more maintainable to associate the
include/exclude information with the test as the key as opposed to the version
being the key i.e.

instead of

0.20.0
include - test1.q, test2.q ..
exclude - test3.q

0.17.0
include - test3.q
exclude - test1.q

we do

test1.q
exclude - 0.17.0

test2.q
include - = 0.17.0

or something on that line... this may make adding tests to versions fairly easy.

Unit test should be shim-aware
--

Key: HIVE-1359
URL: https://issues.apache.org/jira/browse/HIVE-1359
Project: Hadoop Hive
Issue Type: New Feature
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: unit_tests.txt

Some features in Hive only works for certain Hadoop versions through shim.
However the unit test structure is not shim-aware in that there is only one
set of queries and expected outputs for all Hadoop versions. This may not be
sufficient when we will have different output for different Hadoop versions.
One example is CombineHiveInputFormat wich is only available from Hadoop
0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be
different. Another example is archival partitions (HAR) which is also only
available from 0.20.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1265) Function Registry should should auto-detect UDFs from UDF Description


[ 
https://issues.apache.org/jira/browse/HIVE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872993#action_12872993
 ] 

Ashish Thusoo commented on HIVE-1265:
-

Can you explain more what you mean by it is picking up the test class path? 
When you get the classes for a package, it should return you all the classes in 
that package irrespective of the location. 

+1 to the general approach here.

 Function Registry should should auto-detect UDFs  from UDF Description
 --

 Key: HIVE-1265
 URL: https://issues.apache.org/jira/browse/HIVE-1265
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1265-patch.diff


 We should be able to register functions dynamically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1371) remove blank in rcfilecat


 [ 
https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1371:


Status: Patch Available  (was: Open)

Hi Yongqiang,

Please do a submit patch when putting up a patch.

Thanks,
Ashish

 remove blank in rcfilecat
 -

 Key: HIVE-1371
 URL: https://issues.apache.org/jira/browse/HIVE-1371
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1371.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1371) remove blank in rcfilecat


[ 
https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872995#action_12872995
 ] 

Ashish Thusoo commented on HIVE-1371:
-

+1.

Will commit.


 remove blank in rcfilecat
 -

 Key: HIVE-1371
 URL: https://issues.apache.org/jira/browse/HIVE-1371
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1371.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()


[ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872997#action_12872997
 ] 

Ashish Thusoo commented on HIVE-1369:
-

I I do not see any drawbacks here. I think another requirement from this was 
that the serialization be such that it is order preserving whereever there is a 
notion of order, as this serde could also be used to serialize between 
map/reduce boundaries. So if the implementation takes care of that and does not 
introduce oerhead I think this would be good.

Others, what do you think about this?

Ashish

 LazySimpleSerDe should be able to read classes that support some form of 
 toString()
 ---

 Key: HIVE-1369
 URL: https://issues.apache.org/jira/browse/HIVE-1369
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Kozlov
Priority: Minor
   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
 objects.  It should be pretty easy to extend the class to read any object 
 that implements toString() method.
 Ideas or concerns?
 Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1373) Missing connection pool plugin in Eclipse classpath


 [ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1373:
---

Assignee: Vinithra Varadharajan

Have added you to the contributors so you should be able to assign things to 
yourself now.

Thx.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:153

[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath


[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872237#action_12872237
 ] 

Ashish Thusoo commented on HIVE-1373:
-

+1. Looks good to me. I think in future we should move all the lib dependencies 
in the eclipse files to come from build/dist/lib as that will help us migrate 
more stuff over to ivy.

Will run tests and commit once the tests pass.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191

[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it


[ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872238#action_12872238
 ] 

Ashish Thusoo commented on HIVE-802:


Should we just mark this as a duplicate of 1176 in that case?

 Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
 -

 Key: HIVE-802
 URL: https://issues.apache.org/jira/browse/HIVE-802
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Todd Lipcon
Assignee: Arvind Prabhakar

 There's a bug in DataNucleus that causes this issue:
 http://www.jpox.org/servlet/jira/browse/NUCCORE-371
 To reproduce, simply put your hive source tree in a directory that contains a 
 '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously


[ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872240#action_12872240
 ] 

Ashish Thusoo commented on HIVE-80:
---

yes I think what Ning is saying is correct. We should however add a test case 
to the unit tests to check that. I am not sure that we added a test case for 
the parallel execution stuff.

 Allow Hive Server to run multiple queries simulteneously
 

 Key: HIVE-80
 URL: https://issues.apache.org/jira/browse/HIVE-80
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Raghotham Murthy
Assignee: Neil Conway
Priority: Critical
 Attachments: hive_input_format_race-2.patch


 Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-22 Thread Ashish Thusoo

What is the advantage of becoming a TLP to the project itself? I have heard 
that it is something that apache wants, but considering that we are very 
comfortable on how Hive interacts with the Hadoop ecosystem as a sub project 
for Hadoop, there has to be some big incentive for the project to be a TLP and 
nowhere have a seen how this would benefit Hive. Any thoughts on that?

Ashish


From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Wednesday, April 21, 2010 7:35 PM
To: hive-dev@hadoop.apache.org
Cc: Ashish Thusoo
Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

Hive already does the work to run on multiple versions of Hadoop, and the 
release cycle is independent of Hadoop's. I don't see why it should remain a 
subproject. I'm +1 on Hive becoming a TLP.

On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao 
zsh...@gmail.commailto:zsh...@gmail.com wrote:
As a Hive committer, I don't feel the benefit we get from becoming a
TLP is big enough (compared with the cost) to make Hive a TLP.
From Chris's comment I see that the cost is not that big, but I still
wonder what benefit we will get from that.

Also I didn't get the idea of the joke (In fact, one could argue that
Pig opting not to be TLP yet is why Hive should go TLP). I don't see
any reasons that applies to Pig but not Hive.
We should continue the discussion here, but anything in the Pig's
discussion should also be considered here.

Zheng

On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah 
a...@cloudera.commailto:a...@cloudera.com wrote:
 I am personally +1 on Hive being a TLP, I think it did reach the community
 adoption and maturity level required for that. In fact, one could argue that
 Pig opting not to be TLP yet is why Hive should go TLP :) (jk).

 The real question to ask is whether there is a volunteer to take care of the
 administrative tasks, which isn't a ton of work afaiu (I am willing to
 volunteer if no body else up to the task, but I am not a committer and only
 contributed a minor patch for bash/cygwin).

 BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
 tradeoffs. I happen to agree with all he says, and frankly I couldn't have
 wrote it better my self. I highlight certain parts from his message, but I
 recommend you read the whole thing.

 -- Forwarded message --
 From: Chris Douglas cdoug...@apache.orgmailto:cdoug...@apache.org
 Date: Tue, Apr 13, 2010 at 11:46 PM
 Subject: Subprojects and TLP status
 To: gene...@hadoop.apache.orgmailto:gene...@hadoop.apache.org, 
 priv...@hadoop.apache.orgmailto:priv...@hadoop.apache.org

 Most of Hadoop's subprojects have discussed becoming top-level Apache
 projects (TLPs) in the last few weeks. Most have expressed a desire to
 remain in Hadoop. The salient parts of the discussions I've read tend
 to focus on three aspects: a technical dependence on Hadoop,
 additional overhead as a TLP, and visibility both within the Hadoop
 ecosystem and in the open source community generally.

 Life as a TLP: this is not much harder than being a Hadoop subproject,
 and the Apache preferences being tossed around- particularly
 insufficiently diverse- are not blockers. Every subproject needs to
 write a section of the report Hadoop sends to the board; almost the
 same report, sent to a new address. The initial cost is similarly
 light: copy bylaws, send a few notes to INFRA, and follow some
 directions. I think the estimated costs are far higher than they will
 be in practice. Inertia is a powerful force, but it should be
 overcome. The directions are here, and should not intimidating:

 http://apache.org/dev/project-creation.html

 Visibility: the Hadoop site does not need to change. For each
 subproject, we can literally change the hyperlinks to point to the new
 page and be done. Long-term, linking to all ASF projects that run on
 Hadoop from a prominent page is something we all want. So particularly
 in the medium-term that most are considering: visibility through the
 website will not change. Each subproject will still be linked from the
 front page.

 Hadoop would not be nearly as popular as it is without Zookeeper,
 HBase, Hive, and Pig. All statistics on work in shared MapReduce
 clusters show that users vastly prefer running Pig and Hive queries to
 writing MapReduce jobs. HBase continues to push features in HDFS that
 increase its adoption and relevance outside MapReduce, while sharing
 some of its NoSQL limelight. Zookeeper is not only a linchpin in real
 workloads, but many proposals for future features require it. The
 bottom line is that MapReduce and HDFS need these projects for
 visibility and adoption in precisely the same way. I don't think
 separate TLPs will uncouple the broader community from one another.

 Technical dependence: this has two dimensions. First, influencing
 MapReduce and HDFS. This is nonsense. Earning influence by
 contributing to a subproject is the only way to push code changes

[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket

[
https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859956#action_12859956
]

Ashish Thusoo commented on HIVE-987:

I am +1 on this. I think this can open up good possibilities. I have not looked
at sqlline code but how much does it depend on actually SQL dialect. Plus, how
easy is it to extend to hdfs related command e.g. the CLI today has commands
that can do set of conf variables. It also supports the hadoop dfs commands as
well which talk directly to hdfs. I am not sure if too many people use them,
but I do. Would be great to get them integrated with sqlline if that is
possible.

Hive CLI Omnibus Improvement ticket
---

Key: HIVE-987
URL: https://issues.apache.org/jira/browse/HIVE-987
Project: Hadoop Hive
Issue Type: Improvement
Reporter: Carl Steinbach
Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar

Add the following features to the Hive CLI:
* Command History
* ReadLine support
** HIVE-120: Add readline support/support for alt-based commands in the CLI
** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably
need to use JLine instead.
* Tab completion
** HIVE-97: tab completion for hive cli
* Embedded/Standalone CLI modes, and ability to connect to different Hive
Server instances.
** HIVE-818: Create a Hive CLI that connects to hive ThriftServer
* .hiverc configuration file
** HIVE-920: .hiverc doesnt work
* Improved support for comments.
** HIVE-430: Ability to comment desired for hive query files
* Different output formats
** HIVE-49: display column header on CLI
** XML output format
For additional inspiration we may want to look at the Postgres psql shell:
http://www.postgresql.org/docs/8.1/static/app-psql.html
Finally, it would be really cool if we implemented this in a generic fashion
and spun it off as an apache-commons
shell framework. It seems like most of the Apache Hadoop projects have their
own shells, and I'm sure the same is true
for non-Hadoop Apache projects as well.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1320) NPE with lineage in a query of union alls on joins.

NPE with lineage in a query of union alls on joins.
---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


The following query generates a NPE in the lineage ctx code

EXPLAIN
INSERT OVERWRITE TABLE dest_l1
SELECT j.*
FROM (SELECT t1.key, p1.value
  FROM src1 t1
  LEFT OUTER JOIN src p1
  ON (t1.key = p1.key)
  UNION ALL
  SELECT t2.key, p2.value
  FROM src1 t2
  LEFT OUTER JOIN src p2
  ON (t2.key = p2.key)) j;

The stack trace is:

FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
at 
org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
at 
org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.


 [ 
https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1320:


Attachment: HIVE-1320.patch

Fixed the NPE. The cause was that we were not checking for inp_dep to be null 
in the union all code path. We have to do that for all operators that have more 
than 1 parents.


 NPE with lineage in a query of union alls on joins.
 ---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1320.patch


 The following query generates a NPE in the lineage ctx code
 EXPLAIN
 INSERT OVERWRITE TABLE dest_l1
 SELECT j.*
 FROM (SELECT t1.key, p1.value
   FROM src1 t1
   LEFT OUTER JOIN src p1
   ON (t1.key = p1.key)
   UNION ALL
   SELECT t2.key, p2.value
   FROM src1 t2
   LEFT OUTER JOIN src p2
   ON (t2.key = p2.key)) j;
 The stack trace is:
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.


 [ 
https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1320:


   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
Fix Version/s: 0.6.0

 NPE with lineage in a query of union alls on joins.
 ---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Fix For: 0.6.0

 Attachments: HIVE-1320.patch


 The following query generates a NPE in the lineage ctx code
 EXPLAIN
 INSERT OVERWRITE TABLE dest_l1
 SELECT j.*
 FROM (SELECT t1.key, p1.value
   FROM src1 t1
   LEFT OUTER JOIN src p1
   ON (t1.key = p1.key)
   UNION ALL
   SELECT t2.key, p2.value
   FROM src1 t2
   LEFT OUTER JOIN src p2
   ON (t2.key = p2.key)) j;
 The stack trace is:
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-19 Thread Ashish Thusoo

Hi Folks,

Recently Apache Board asked the Hadoop PMC if some sub projects can become top 
level projects. In the opinion of the board, big umbrella projects make it 
difficult to monitor the health of the communities within the sub projects. If 
Hive does become a TLP, then we would have to elect our own PMC and take on all 
the administrative tasks that the Hadoop PMC does for us. So there is 
definitely more administrative work involved as a TLP. So the question is 
whether we should take on this additional task keeping at this time and what 
tangible advantages and disadvantages would such a move entail for the project. 
Would like to hear what the community thinks on this issue.

Thanks,
Ashish

PS: As some reference to what is happening in the other subprojects, at this 
time PIG and Zookeeper have decided NOT to become TLPs where as Hbase and Avro 
have decided to become TLPs.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-04-14 Thread Ashish Thusoo (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857110#action_12857110
]

Ashish Thusoo commented on HIVE-1293:
-

I would vote for versioning. Since we do not have to deal with the complexity
of a buffer cache I think this would be much simpler to implement than what is
possible in traditional databases. At the same time, for locks we will have to
do a lease based mechanism anyway in order to protect against locks leaking
because of client crashes. And when you account for that, it seems that locking
would not be significantly simpler to implement than versioning.

Concurreny Model for Hive
-

Key: HIVE-1293
URL: https://issues.apache.org/jira/browse/HIVE-1293
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain

Concurrency model for Hive:
Currently, hive does not provide a good concurrency model. The only
guanrantee provided in case of concurrent readers and writers is that
reader will not see partial data from the old version (before the write) and
partial data from the new version (after the write).
This has come across as a big problem, specially for background processes
performing maintenance operations.
The following possible solutions come to mind.
1. Locks: Acquire read/write locks - they can be acquired at the beginning of
the query or the write locks can be delayed till move
task (when the directory is actually moved). Care needs to be taken for
deadlocks.
2. Versioning: The writer can create a new version if the current version is
being read. Note that, it is not equivalent to snapshots,
the old version can only be accessed by the current readers, and will be
deleted when all of them have finished.
Comments.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-05 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_8.patch

Another one with test fixes.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch, 
 HIVE-1131_8.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_6.patch

With fixes to tests and with null dropped.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_7.patch

Another patch which fixes the QueryPlan to have LinkedHashMaps as that was also 
creating instability in the tests.

 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Release Note: This changes the signature of PostExecute.java
Hadoop Flags: [Incompatible change]
  Status: Patch Available  (was: Open)

submitting.

 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-31 Thread Ashish Thusoo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852013#action_12852013
 ] 

Ashish Thusoo commented on HIVE-1131:
-

I looked at the ExecutionCtx stuff. There are atleast 3 different unrelated 
fields in SessionState that we should also move to the ExecutionCtx. I will 
file a follow up JIRA for it but I think we should get this one in. I did see 
some test failures due to using HashMaps and the consequent change in ordering 
after I refreshed. Will fix that and upload a new patch.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-31 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_5.patch

Added a more centralized function to decide what is the dependency type. Also 
reduced the number of dependency types to SIMPLE, EXPRESSION and SELECT. SIMPLE 
= a copy of the column, EXPRESSION = UDF, UDAF, UDTF or union all, SCRIPT = if 
a user script is used.

Also fixed the HashMap to LinkedHashMap..


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-30 Thread Ashish Thusoo (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851674#action_12851674
]

Ashish Thusoo commented on HIVE-1131:
-

Look at the DataContainer class. That has a partition in it. And the Dependency
has a mapping from Partition to the dependencies. Can you explain more your
concerns on inefficiency?

For S6 actually the queryplan is the wrong place to store the lineageinfo.
Because of the dynamic partitioning work that Ning is doing, I have to generate
the partition to dependency mapping at run time. So I would rather store it in
a run time structure as opposed to a compile time structure. SessionState fits
that bill, though I think we should have another structure called ExecutionCtx
for this. But otherwise I think we want to store this in a runtime structure.

S2 will add some more comments.

Add column lineage information to the pre execution hooks
-

Key: HIVE-1131
URL: https://issues.apache.org/jira/browse/HIVE-1131
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch,
HIVE-1131_4.patch

We need a mechanism to pass the lineage information of the various columns of
a table to a pre execution hook so that applications can use that for:
- auditing
- dependency checking
and many other applications.
The proposal is to expose this through a bunch of classes to the pre
execution hook interface to the clients and put in the necessary
transformation logic in the optimizer to generate this information.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_2.patch

Patch with all the review comments incorporated. This is just the source patch. 
Will be uploading the fixed tests shortly.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks


[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849624#action_12849624
 ] 

Ashish Thusoo commented on HIVE-1131:
-

Comment 3 from Raghu and comment S2-S4 from Zheng are not yet incorporated.

The new patch overhauls things a bit to support Partition level lineage and 
does this in a post execute hook. It gets rid of the visits and the iterator 
classes. Will fix the other comments in the patch with the test cases.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks


 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_3.patch

This fixes all the review comments. Will post the patch with tests separately.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks


[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849633#action_12849633
 ] 

Ashish Thusoo commented on HIVE-1131:
-

Also I did not find any instance of S3 in the code. Perhaps you just mentioned 
it for completeness but in case you do find an instance please let me know the 
offending file.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks