[jira] Commented: (HIVE-1386) HiveQL SQL Compliance (Umbrella)

2010-06-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876556#action_12876556
 ] 

Ning Zhang commented on HIVE-1386:
--

@jeff, order by is already supported.

In addition, I see some requests for (sorry couldn't find out if they have a 
JIRA open since the web site is extremely slow).
 - UNION ALL without the outer SELET * wrapper
 - INSERT INTO/APPEND in addition to INSERT OVERWRITE

> HiveQL SQL Compliance (Umbrella)
> 
>
> Key: HIVE-1386
> URL: https://issues.apache.org/jira/browse/HIVE-1386
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>
> This is an umbrella ticket to track work related to HiveQL compliance with 
> the SQL standard, e.g. supported query syntax, data types, views, catalog 
> access, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1386) HiveQL SQL Compliance (Umbrella)

2010-06-07 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876554#action_12876554
 ] 

Jeff Hammerbacher commented on HIVE-1386:
-

Do we need ORDER BY as well (HIVE-61)?

> HiveQL SQL Compliance (Umbrella)
> 
>
> Key: HIVE-1386
> URL: https://issues.apache.org/jira/browse/HIVE-1386
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>
> This is an umbrella ticket to track work related to HiveQL compliance with 
> the SQL standard, e.g. supported query syntax, data types, views, catalog 
> access, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-07 Thread Alex Kozlov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876509#action_12876509
 ] 

Alex Kozlov commented on HIVE-1369:
---

Here is a more complete description how to use the new functionality.

Let's say you have a Writable object in a Sequence file.  Let's say it is an 
implementation of Session class which contains an array of events and each 
Event object associated with type, timestamp, and a Map.

You can define the following table in Hive:

CREATE EXTERNAL TABLE session (
  uid STRING,
  events ARRAY < STRUCT < type : INT, ts : BIGINT, map : MAP < STRING, STRING > 
> >
)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.LazySimpleSerDe'
 STORED AS SEQUENCEFILE
LOCATION 'location_of_your_sequence_file_with_your_writable_as_value'
;

Instead of implementing a fully functional SerDe for this class (even though 
it's probably a good exercise in the long run), with HIVE-1369 one can just 
write toString(byte[]) method for the above Writable:

public String toString(byte[] sep) {
  StringBuffer sb = new StringBuffer();
  sb.append(getUId());
  sb.append((char)sep[0]);
  boolean firstEvent = true;
  for (Event event : getEvents()) {
if (firstEvent) {
  firstEvent = false;
} else {
  sb.append((char)sep[1]);
}
sb.append(getType());
sb.append((char) sep[2]);
sb.append(getTimestamp());
sb.append((char) sep[2]);
Map map = event.getMap();
boolean firstKey = true;
if (map != null && !map.isEmpty()) {
   for(Key k : map.getKeys()) {
 if (firstKey) {
firstKey = false;
 } else {
sb.append((char) sep[3]);
 }
 sb.append(key);
 sb.append((char) sep[4]);
 sb.append(map.get(key));
  }
} else {
  sb.append("\\N");
}
  }
}

This will obviously be less efficient than implementing a full SerDe, but much 
more flexible and faster.

The default Java implementation is toString() with no parameters, so there is 
no conflict here.  I was thinking about adding some other parameters like null 
string or escape char, but decided to keep it simple.  There is an option to 
use JSON serialization as well (probably slower).

Alex K


> LazySimpleSerDe should be able to read classes that support some form of 
> toString()
> ---
>
> Key: HIVE-1369
> URL: https://issues.apache.org/jira/browse/HIVE-1369
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
>Priority: Minor
> Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
> objects.  It should be pretty easy to extend the class to read any object 
> that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1393) Column pruner prunes essential columns in map join

2010-06-07 Thread Paul Yang (JIRA)
Column pruner prunes essential columns in map join
--

 Key: HIVE-1393
 URL: https://issues.apache.org/jira/browse/HIVE-1393
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang


Run: 
{code}
CREATE TABLE tmp_pyang_src2 (key STRING, value STRING);
INSERT OVERWRITE TABLE tmp_pyang_src2 SELECT key, value FROM src;

SELECT  /*+ MAPJOIN(a) */ a.key, a.value
FROM tmp_pyang_src2 a JOIN src b ON a.key=b.key
LIMIT 10;
{code}

a.value will show up as null. Either removing the MAPJOIN hint or set 
hive.optimize.cp=false; will produce the correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1383) allow HBase WAL to be disabled

2010-06-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876480#action_12876480
 ] 

Ning Zhang commented on HIVE-1383:
--

Looks good in general, except one nitpick: should we move HBASE_WAL_ENABLED = 
"hive.hbase.wal.enabled" to HiveConf.java and add an entry to hive-default.xml? 
It would be easier for users to browse all hive settings and their default 
values from a centralized conf file. 

> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1383) allow HBase WAL to be disabled

2010-06-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876464#action_12876464
 ] 

Ning Zhang commented on HIVE-1383:
--

I will take a look.

> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-07 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.5.patch

Updated to current trunk.

> Archiving partitions
> 
>
> Key: HIVE-1332
> URL: https://issues.apache.org/jira/browse/HIVE-1332
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
> HIVE-1332.4.patch, HIVE-1332.5.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An 
> issue is that as the number of files increase, there will be higher 
> memory/load requirements on the namenode. Partitions in bucketed tables are a 
> particular problem because they consist of many files, one for each of the 
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE  ARCHIVE PARTITION 
>  that would automatically put the files for the partition into a HAR 
> file. We would also have an UNARCHIVE option to convert the files in the 
> partition back to the original files. Archived partitions would be slower to 
> access, but they would have the same functionality and decrease the number of 
> files drastically. Typically, only seldom accessed partitions would be 
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the 
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-07 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Status: Patch Available  (was: Open)

> Archiving partitions
> 
>
> Key: HIVE-1332
> URL: https://issues.apache.org/jira/browse/HIVE-1332
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
> HIVE-1332.4.patch, HIVE-1332.5.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An 
> issue is that as the number of files increase, there will be higher 
> memory/load requirements on the namenode. Partitions in bucketed tables are a 
> particular problem because they consist of many files, one for each of the 
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE  ARCHIVE PARTITION 
>  that would automatically put the files for the partition into a HAR 
> file. We would also have an UNARCHIVE option to convert the files in the 
> partition back to the original files. Archived partitions would be slower to 
> access, but they would have the same functionality and decrease the number of 
> files drastically. Typically, only seldom accessed partitions would be 
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the 
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #464

2010-06-07 Thread Apache Hudson Server
See 

--
[...truncated 14075 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Out

Hudson build is back to normal : Hive-trunk-h0.17 #462

2010-06-07 Thread Apache Hudson Server
See 




[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-06-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876250#action_12876250
 ] 

Edward Capriolo commented on HIVE-1373:
---


{quote}
1 copy is anyway done from lib to dist/lib for these jars. If we go directly to 
ivy we would copy things from the ivy cache to dist/lib. So the number of 
copies in the build process
would remain the same, no? There is of course the first time overhead of 
downloading these jars from their repos to the ivy cache.
{quote}

I follow what you are thinking. Currently the code I did takes specifc jars 
from metastore ivy dowloads. We could probably have ivy download directly to 
build/lib. I just think we should watch to make sure many unneeded jars do not 
appear.

> Missing connection pool plugin in Eclipse classpath
> ---
>
> Key: HIVE-1373
> URL: https://issues.apache.org/jira/browse/HIVE-1373
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
> Environment: Eclipse, Linux
>Reporter: Vinithra Varadharajan
>Assignee: Vinithra Varadharajan
> Attachments: HIVE-1373.patch
>
>
> In a recent checkin, connection pool dependency was introduced but eclipse 
> .classpath file was not updated.  This causes launch configurations from 
> within Eclipse to fail.
> {code}
> hive> show tables;
> show tables;
> 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
> 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
> 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
> 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
> deserializer)], properties:null)
> 10/05/26 14:59:46 INFO ql.Driver: query plan = 
> file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
> 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
> 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
> FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
> creating transactional connection factory
> NestedThrowables:
> java.lang.reflect.InvocationTargetException
> 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
> javax.jdo.JDOFatalInternalException: Error creating transactional connection 
> factory
> NestedThrowables:
> java.lang.reflect.InvocationTargetException
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Error creating transactional connection 
> factory
> NestedThrowables:
> java.lang.reflect.InvocationTargetException
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
> Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
> connection factory
> NestedThrowables:
> java.lang.reflect.InvocationTargetException
>   at 
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
>   at 
> org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
>   at 
> org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.jdo.JDOHelper.invoke(JDOHelper.java: