[jira] Created: (HIVE-1918) Add export/import facilities to the hive system

2011-01-17 Thread Krishna Kumar (JIRA)
Add export/import facilities to the hive system
---

 Key: HIVE-1918
 URL: https://issues.apache.org/jira/browse/HIVE-1918
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Krishna Kumar


This is an enhancement request to add export/import features to hive.

With this language extension, the user can export the data of the table - which 
may be located in different hdfs locations in case of a partitioned table - as 
well as the metadata of the table into a specified output location. This output 
location can then be moved over to another different hadoop/hive instance and 
imported there.  

This should work independent of the source and target metastore dbms used; for 
instance, between derby and mysql.

For partitioned tables, the ability to export/import a subset of the partition 
must be supported.

Howl will add more features on top of this: The ability to create/use the 
exported data even in the absence of hive, using MR or Pig. Please see 
http://wiki.apache.org/pig/Howl/HowlImportExport for these details.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #493

2011-01-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/493/

--
[...truncated 21415 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 

[jira] Commented: (HIVE-1211) Tapping logs from child processes

2011-01-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982917#action_12982917
 ] 

Carl Steinbach commented on HIVE-1211:
--

Can someone please take a look at this? Thanks!

 Tapping logs from child processes
 -

 Key: HIVE-1211
 URL: https://issues.apache.org/jira/browse/HIVE-1211
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: bc Wong
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-1211-2.patch, HIVE-1211.1.patch, 
 HIVE-1211.3.patch.txt, HIVE-1211.4.patch.txt, HIVE-1211.5.patch.txt, 
 HIVE-1211.6.patch.txt


 Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
 the parent's stdout/stderr. There is little one can do to to sort out which 
 log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

2011-01-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982999#action_12982999
 ] 

John Sichi commented on HIVE-1694:
--

Is there an updated tree for this?  I checked github but didn't see it.  
HIVE-1644 needs support for compiling internally-generated SQL into operators, 
so if you have that working, I'd like to point the Harvey Mudd folks at it when 
I talk to them tomorrow.


 Accelerate query execution using indexes
 

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1903) Can't join HBase tables if one's name is the beginning of the other

2011-01-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1903:
-

Component/s: HBase Handler

 Can't join HBase tables if one's name is the beginning of the other
 ---

 Key: HIVE-1903
 URL: https://issues.apache.org/jira/browse/HIVE-1903
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Jean-Daniel Cryans
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1903.1.patch


 I tried joining two tables, let's call them table and table_a, but I'm 
 seeing an array of errors such as this:
 {noformat}
 java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:118)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:231)
 {noformat}
 The reason is that HiveInputFormat.pushProjectionsAndFilters matches the 
 aliases with startsWith so in my case the mappers for table_a were getting 
 the columns from table as well as its own (and since it had less column, it 
 was trying to get one too far in the array).
 I don't know if just changing it to equals fill fix it, my guess is it 
 won't, since it may break RCFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1716) TestHBaseCliDriver failed in current trunk

2011-01-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1716:
-

Component/s: HBase Handler

 TestHBaseCliDriver failed in current trunk
 --

 Key: HIVE-1716
 URL: https://issues.apache.org/jira/browse/HIVE-1716
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: John Sichi
 Fix For: 0.7.0


 ant test -Dhadoop.version=0.20.0 -Dtestcase=TestHBaseCliDriver:
  
[junit] org.apache.hadoop.hbase.client.NoServerForRegionException: Timed 
 out trying to locate root region
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:976)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:607)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:738)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
 [junit] at 
 org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseTestSetup.setUpFixtures(HBaseTestSetup.java:87)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseTestSetup.preTest(HBaseTestSetup.java:59)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseQTestUtil.init(HBaseQTestUtil.java:31)
 [junit] at 
 org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:43)
 [junit] at junit.framework.TestCase.runBare(TestCase.java:125)
 [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
 [junit] at 
 junit.framework.TestResult.runProtected(TestResult.java:124)
 [junit] at junit.framework.TestResult.run(TestResult.java:109)
 [junit] at junit.framework.TestCase.run(TestCase.java:118)
 [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
 [junit] at junit.framework.TestSuite.run(TestSuite.java:203)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.