date:20100622


 [ 
https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1417:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

committed. Thanks Paul

 Archived partitions throw error with queries calling getContentSummary
 --

 Key: HIVE-1417
 URL: https://issues.apache.org/jira/browse/HIVE-1417
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch


 Assuming you have a src table with a ds='1' partition that is archived in 
 HDFS, the following query will throw an exception
 {code}
 select count(1) from src where ds='1' group by key;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-535) Memory-efficient hash-based Aggregation

2010-06-22 Thread He Yongqiang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881121#action_12881121
]

He Yongqiang commented on HIVE-535:
---

Has anyone tried google sparsehash http://code.google.com/p/google-sparsehash/
? It's BSD license. But it seems it is in C, and no java version.

Memory-efficient hash-based Aggregation
---

Key: HIVE-535
URL: https://issues.apache.org/jira/browse/HIVE-535
Project: Hadoop Hive
Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Zheng Shao

Currently there are a lot of memory overhead in the hash-based aggregation in
GroupByOperator.
The net result is that GroupByOperator won't be able to store many entries in
its HashTable, and flushes frequently, and won't be able to achieve very good
partial aggregation result.
Here are some initial thoughts (some of them are from Joydeep long time ago):
A1. Serialize the key of the HashTable. This will eliminate the 16-byte
per-object overhead of Java in keys (depending on how many objects there are
in the key, the saving can be substantial).
A2. Use more memory-efficient hash tables - java.util.HashMap has about 64
bytes of overhead per entry.
A3. Use primitive array to store aggregation results. Basically, the UDAF
should manage the array of aggregation results, so UDAFCount should manage a
long[], UDAFAvg should manage a double[] and a long[]. The external code
should pass an index to iterate/merge/terminal an aggregation result. This
will eliminate the 16-byte per-object overhead of Java.
More ideas are welcome.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2010-06-22 Thread Ted Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-1342:
-

Attachment: ppd_same_alias_1.patch

I think PPD is unnecessarily resolving table aliases when encountered 
CommonJoinOperator. 
I attached a patch fixing it. Please have a review.

 Predicate push down get error result when sub-queries have the same alias 
 name 
 ---

 Key: HIVE-1342
 URL: https://issues.apache.org/jira/browse/HIVE-1342
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.2, 0.5.0
Reporter: Ted Xu
Priority: Critical
 Attachments: cmd.hql, explain, ppd_same_alias_1.patch


 Query is over-optimized by PPD when sub-queries have the same alias name, see 
 the query:
 ---
 create table if not exists dm_fact_buyer_prd_info_d (
   category_id string
   ,gmv_trade_num  int
   ,user_idint
   )
 PARTITIONED BY (ds int);
 set hive.optimize.ppd=true;
 set hive.map.aggr=true;
 explain select category_id1,category_id2,assoc_idx
 from (
   select 
   category_id1
   , category_id2
   , count(distinct user_id) as assoc_idx
   from (
   select 
   t1.category_id as category_id1
   , t2.category_id as category_id2
   , t1.user_id
   from (
   select category_id, user_id
   from dm_fact_buyer_prd_info_d
   group by category_id, user_id ) t1
   join (
   select category_id, user_id
   from dm_fact_buyer_prd_info_d
   group by category_id, user_id ) t2 on 
 t1.user_id=t2.user_id 
   ) t1
   group by category_id1, category_id2 ) t_o
   where category_id1  category_id2
   and assoc_idx  2;
 -
 The query above will fail when execute, throwing exception: can not cast 
 UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text). 
 I explained the query and the execute plan looks really wired ( only Stage-1, 
 see the highlighted predicate):
 ---
 Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 t_o:t1:t1:dm_fact_buyer_prd_info_d 
   TableScan
 alias: dm_fact_buyer_prd_info_d
 Filter Operator
   predicate:
   expr: *(category_id  user_id)*
   type: boolean
   Select Operator
 expressions:
   expr: category_id
   type: string
   expr: user_id
   type: bigint
 outputColumnNames: category_id, user_id
 Group By Operator
   keys:
 expr: category_id
 type: string
 expr: user_id
 type: bigint
   mode: hash
   outputColumnNames: _col0, _col1
   Reduce Output Operator
 key expressions:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 sort order: ++
 Map-reduce partition columns:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 tag: -1
   Reduce Operator Tree:
 Group By Operator
   keys:
 expr: KEY._col0
 type: string
 expr: KEY._col1
 type: bigint
   mode: mergepartial
   outputColumnNames: _col0, _col1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: true
   GlobalTableId: 0
   table:
   input format: 
 org.apache.hadoop.mapred.SequenceFileInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  --
 If disabling predicate push down (set hive.optimize.ppd=true), the error is 
 gone; I tried

Hudson build is back to normal : Hive-trunk-h0.18 #480

2010-06-22 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/480/

[jira] Commented: (HIVE-1359) Unit test should be shim-aware


[ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881255#action_12881255
 ] 

Namit Jain commented on HIVE-1359:
--

Requirements 2 and 3 are not addressed in the above patch - talked to Ning 
offline, and we can do them in a follow-up.
Filed a new jira for the same https://issues.apache.org/jira/browse/HIVE-1424

 Unit test should be shim-aware
 --

 Key: HIVE-1359
 URL: https://issues.apache.org/jira/browse/HIVE-1359
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1359.patch, unit_tests.txt


 Some features in Hive only works for certain Hadoop versions through shim. 
 However the unit test structure is not shim-aware in that there is only one 
 set of queries and expected outputs for all Hadoop versions. This may not be 
 sufficient when we will have different output for different Hadoop versions. 
 One example is CombineHiveInputFormat wich is only available from Hadoop 
 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
 different. Another example is archival partitions (HAR) which is also only 
 available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1424) Unit tests should be shim aware

Unit tests should be shim aware
---

 Key: HIVE-1424
 URL: https://issues.apache.org/jira/browse/HIVE-1424
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Namit Jain


Followup of https://issues.apache.org/jira/browse/HIVE-1359, requirements 2 and 
3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1359) Unit test should be shim-aware


[ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881258#action_12881258
 ] 

Namit Jain commented on HIVE-1359:
--

This is also needed for https://issues.apache.org/jira/browse/HIVE-1307 in 0.6
We can fix https://issues.apache.org/jira/browse/HIVE-1424 in 0.7, it need not 
be merged in 0.6

 Unit test should be shim-aware
 --

 Key: HIVE-1359
 URL: https://issues.apache.org/jira/browse/HIVE-1359
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1359.patch, unit_tests.txt


 Some features in Hive only works for certain Hadoop versions through shim. 
 However the unit test structure is not shim-aware in that there is only one 
 set of queries and expected outputs for all Hadoop versions. This may not be 
 sufficient when we will have different output for different Hadoop versions. 
 One example is CombineHiveInputFormat wich is only available from Hadoop 
 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
 different. Another example is archival partitions (HAR) which is also only 
 available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1359) Unit test should be shim-aware


[ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881274#action_12881274
 ] 

Namit Jain commented on HIVE-1359:
--

Looks good to me - John, can you also review ?

 Unit test should be shim-aware
 --

 Key: HIVE-1359
 URL: https://issues.apache.org/jira/browse/HIVE-1359
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1359.patch, unit_tests.txt


 Some features in Hive only works for certain Hadoop versions through shim. 
 However the unit test structure is not shim-aware in that there is only one 
 set of queries and expected outputs for all Hadoop versions. This may not be 
 sufficient when we will have different output for different Hadoop versions. 
 One example is CombineHiveInputFormat wich is only available from Hadoop 
 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
 different. Another example is archival partitions (HAR) which is also only 
 available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1359) Unit test should be shim-aware

[
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881285#action_12881285
]

John Sichi commented on HIVE-1359:
--

Since we're not actually dealing with the minimr requirement in this patch,
probably better to just leave out those changes completely and we'll address
them in HIVE-117. In particular, I don't think the cluster mode should be part
of the test code generation; we want it completely dynamic so that we re-run
the same test in either mode without regenerating code.

Minor nitpicks (these can be fixed in the followup instead of now):

* hadoopVersion = new String() is the same as hadoopVersion =

* usage of Stack is deprecated since it is based on synchronized Vector

Unit test should be shim-aware
--

Key: HIVE-1359
URL: https://issues.apache.org/jira/browse/HIVE-1359
Project: Hadoop Hive
Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Fix For: 0.6.0, 0.7.0

Attachments: HIVE-1359.patch, unit_tests.txt

Some features in Hive only works for certain Hadoop versions through shim.
However the unit test structure is not shim-aware in that there is only one
set of queries and expected outputs for all Hadoop versions. This may not be
sufficient when we will have different output for different Hadoop versions.
One example is CombineHiveInputFormat wich is only available from Hadoop
0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be
different. Another example is archival partitions (HAR) which is also only
available from 0.20.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Fix Version/s: 0.7.0
Affects Version/s: 0.6.0
   (was: 0.7.0)

 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881293#action_12881293
 ] 

Paul Yang commented on HIVE-1176:
-

For some reason, I don't see the JDO files being deleted after applying the 
patch:

{code}
?  build.xml.orig
?  HIVE-1176-2.patch
?  test.log
M  eclipse-templates/.classpath
M  build.properties
M  build.xml
?  metastore/test.log
M  metastore/ivy.xml
M  
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
!  lib/jdo2-api-2.3-SNAPSHOT.LICENSE
!  lib/datanucleus-rdbms-1.1.2.LICENSE
!  lib/datanucleus-enhancer-1.1.2.LICENSE
!  lib/datanucleus-core-1.1.2.LICENSE
M  ivy/ivysettings.xml
{code}

Also, the patch works for branch 0.6 but not for trunk. Can you regenerate it?

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null


[ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881300#action_12881300
 ] 

Namit Jain commented on HIVE-1422:
--

+1

looks good

 skip counter update when RunningJob.getCounters() returns null
 --

 Key: HIVE-1422
 URL: https://issues.apache.org/jira/browse/HIVE-1422
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1422.1.patch


 Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
 trying to dereference a null Counters object.  I don't have a unit test which 
 can reproduce it, but here's an example stack from a production cluster we 
 saw today:
 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
 with exception 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
 at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-22 Thread Ashish Thusoo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881306#action_12881306
 ] 

Ashish Thusoo commented on HIVE-1271:
-

I am looking at this.


 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-22 Thread Ashish Thusoo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881319#action_12881319
 ] 

Ashish Thusoo commented on HIVE-1271:
-

Looks good to me. However, why remove the check on Category? Also why drop the 
default implementation of the equals method for TypeInfo? 


 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Status: Open  (was: Patch Available)

 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Attachment: HIVE-1304.2.patch

 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


 [ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1394:
-

Attachment: HIVE-1394.patch

Adding a new hint to avoid updating transient_lastDdlTime for both table and 
partitons in metastore.

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Attachments: HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


 [ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1394:


Assignee: Ning Zhang  (was: Paul Yang)

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


 [ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1394:
-

Status: Patch Available  (was: Open)

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Hudson build is back to normal : Hive-trunk-h0.20 #302

2010-06-22 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/302/changes

[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)


[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881361#action_12881361
 ] 

John Sichi commented on HIVE-1364:
--

Currently the view scripts are only in the wiki:

http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades

Per discussion with Ashish, we should open a separate JIRA issue for (at a 
minimum) packaging up example MySQL migration scripts (cumulative across all 
schema changes from 0.5 to 0.6) and explaining what to do with them in the 
release notes.  Carl, do you want to take that on as part of release mgmt?



 Increase the maximum length of SERDEPROPERTIES values (currently 767 
 characters)
 

 Key: HIVE-1364
 URL: https://issues.apache.org/jira/browse/HIVE-1364
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0

 Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch


 The value component of a SERDEPROPERTIES key/value pair is currently limited
 to a maximum length of 767 characters. I believe that the motivation for 
 limiting the length to 
 767 characters is that this value is the maximum allowed length of an index in
 a MySQL database running on the InnoDB engine: 
 http://bugs.mysql.com/bug.php?id=13315
 * The Metastore OR mapping currently limits many fields (including 
 SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
 the fact that these fields are not indexed.
 * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
 * We can expect many users to hit the 767 character limit on 
 SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
 serdeproperty to map a table that has many columns.
 I propose increasing the maximum allowed length of 
 SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null


 [ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1422:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks John

 skip counter update when RunningJob.getCounters() returns null
 --

 Key: HIVE-1422
 URL: https://issues.apache.org/jira/browse/HIVE-1422
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1422.1.patch


 Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
 trying to dereference a null Counters object.  I don't have a unit test which 
 can reproduce it, but here's an example stack from a production cluster we 
 saw today:
 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
 with exception 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
 at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881368#action_12881368
 ] 

Arvind Prabhakar commented on HIVE-1271:


@Ashish: Thanks for looking at the patch. 

bq. why remove the check on Category?

I modified all the specialized type infos to be {{final}} - which in turn 
ensures that if the test on {{instanceof}} succeeds, then they have to be the 
same category type. Therefore, the check on category was redundant going 
forward.

bq. Also why drop the default implementation of the equals method for TypeInfo?

I did this for two main reasons - first that fact that it was implementing the 
{{equals()}} but not {{hashCode()}} method. This could lead to unexpected 
behavior when {{TypeInfo}} instances were put in collections. Second, the 
implementation was modified to make both {{equals()}} and {{hashCode()}} 
methods to be made abstract in order to force any (new) child classes to make 
sure that they implement both consistently.

Let me know if you would like to tweak this change as necessary.

 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881381#action_12881381
 ] 

Arvind Prabhakar commented on HIVE-1176:


@Paul: I just tested the patch (HIVE-1176-2.patch) on latest trunk and it seems 
to apply cleanly. Can you please try again and see if it works? Also, can you 
post the errors that you are seeing? If necessary, I can break down the patch 
into single-file units to help with applying it. Just let me know either way.


 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


 [ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1394:
-

Attachment: HIVE-1394.2.patch

Added a check in SemanticAnalyzer to throw an exception when HOLD_DDLTIME is 
specified in dynamic partition insert or non-existence static partitions.

A negative test case is also added. 

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1394.2.patch, HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


[ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881414#action_12881414
 ] 

Namit Jain commented on HIVE-1394:
--

+1

will commit if the tests pass

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1394.2.patch, HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-22 Thread Carl Steinbach (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881416#action_12881416
]

Carl Steinbach commented on HIVE-1364:
--

bq. Also why do we make everything 4000 bytes - I presume things like ftype
will never hit that limit.

Currently the ORM is the de facto enforcement mechanism for string length
limitations. I think this is a bad approach since 1) users can work around it
by manually altering the underlying tables, and 2) the limits are stated in
terms of bytes so the actual length restriction in terms of number of
characters will depend on the character set of the underlying DB. In light of
this I bumped every size limit to 4000 bytes, and also because I did not want
to try to predict which property length limit someone would next bump into. I'm
willing to revert these limits to their original values. Are there any
properties besides ftype which you want me to revert? Should I revert
everything except SERDEPROPERTIES.PARAM_VALUE?

bq. Also changes to upgrade SQL should also be a part of the patch, no? Where
are the scripts for the view change located?

I'll update the patch with the necessary scripts. Should these go in bin/ or
somewhere under metastore/ ?

@John: Yes, I think this falls under the responsibility of the release manager.
I will take care of it.

I think the current approach of using the ORM as the de facto enforcement
mechanism for checking

Increase the maximum length of SERDEPROPERTIES values (currently 767
characters)

Key: HIVE-1364
URL: https://issues.apache.org/jira/browse/HIVE-1364
Project: Hadoop Hive
Issue Type: Bug
Components: Metastore
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Fix For: 0.6.0

Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch

The value component of a SERDEPROPERTIES key/value pair is currently limited
to a maximum length of 767 characters. I believe that the motivation for
limiting the length to
767 characters is that this value is the maximum allowed length of an index in
a MySQL database running on the InnoDB engine:
http://bugs.mysql.com/bug.php?id=13315
* The Metastore OR mapping currently limits many fields (including
SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite
the fact that these fields are not indexed.
* The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
* We can expect many users to hit the 767 character limit on
SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping
serdeproperty to map a table that has many columns.
I propose increasing the maximum allowed length of
SERDEPROPERTIES.PARAM_VALUE to 8192.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)

2010-06-22 Thread Carl Steinbach (JIRA)

Provide metastore schema migration scripts (0.5 - 0.6)
---

 Key: HIVE-1427
 URL: https://issues.apache.org/jira/browse/HIVE-1427
 Project: Hadoop Hive
  Issue Type: Task
  Components: Metastore
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0


At a minimum this ticket covers packaging up example MySQL migration scripts 
(cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
do with them in the release notes.

This is also probably a good point at which to decide and clearly state which 
Metastore DBs we officially support in production, e.g. do we need to provide 
migration scripts for Derby?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-22 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881426#action_12881426
 ] 

HBase Review Board commented on HIVE-1416:
--

Message from: John Sichi jsi...@facebook.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/223/#review268
---



http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
http://review.hbase.org/r/223/#comment1126

Rather than repeating the HiveConf.getVar in several places, it would be 
cleaner to just pass the configuration down into the Utilities method as the 
new parameter and have it do the configuration check.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
http://review.hbase.org/r/223/#comment1124

This anObject insertion looks like it was accidental.


- John





 Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
 --

 Key: HIVE-1416
 URL: https://issues.apache.org/jira/browse/HIVE-1416
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1416.patch


 Hive parses the file name generated by tasks to figure out the task ID in 
 order to generate files for empty buckets. Different hadoop versions and 
 execution mode have different ways of naming  output files by 
 mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)

2010-06-22 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1427:
-

Affects Version/s: (was: 0.5.0)

 Provide metastore schema migration scripts (0.5 - 0.6)
 ---

 Key: HIVE-1427
 URL: https://issues.apache.org/jira/browse/HIVE-1427
 Project: Hadoop Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0


 At a minimum this ticket covers packaging up example MySQL migration scripts 
 (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
 do with them in the release notes.
 This is also probably a good point at which to decide and clearly state which 
 Metastore DBs we officially support in production, e.g. do we need to provide 
 migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1359) Unit test should be shim-aware


[ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881431#action_12881431
 ] 

John Sichi commented on HIVE-1359:
--

+1.  Will commit when tests pass.


 Unit test should be shim-aware
 --

 Key: HIVE-1359
 URL: https://issues.apache.org/jira/browse/HIVE-1359
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1359.2.patch, HIVE-1359.patch, unit_tests.txt


 Some features in Hive only works for certain Hadoop versions through shim. 
 However the unit test structure is not shim-aware in that there is only one 
 set of queries and expected outputs for all Hadoop versions. This may not be 
 sufficient when we will have different output for different Hadoop versions. 
 One example is CombineHiveInputFormat wich is only available from Hadoop 
 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
 different. Another example is archival partitions (HAR) which is also only 
 available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1229) replace dependencies on HBase deprecated API

2010-06-22 Thread Basab Maulik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881432#action_12881432
 ] 

Basab Maulik commented on HIVE-1229:


... recreating patch against current trunk. Thanks for the feedback!

 replace dependencies on HBase deprecated API
 

 Key: HIVE-1229
 URL: https://issues.apache.org/jira/browse/HIVE-1229
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Basab Maulik
 Attachments: HIVE-1129.1.patch


 Some of these dependencies are on the old Hadoop mapred packages; others are 
 HBase-specific.  The former have to wait until the rest of Hive moves over to 
 the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
 wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Status: Open  (was: Patch Available)

Oops, need to move test to contrib too.


 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-22 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881439#action_12881439
]

HBase Review Board commented on HIVE-1416:
--

Message from: Ning Zhang n.nin...@gmail.com

bq. On 2010-06-22 15:34:57, John Sichi wrote:
bq.
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java,
line 408
bq. http://review.hbase.org/r/223/diff/1/?file=1551#file1551line408
bq.
bq. Rather than repeating the HiveConf.getVar in several places, it
would be cleaner to just pass the configuration down into the Utilities method
as the new parameter and have it do the configuration check.
bq.
bq. Ning Zhang wrote:
bq. I didn't that before, but changed the way in the patch later. The
reason is that the getting the value of localMode is HiveConf.getVar is a hash
lookup and and a string comparison. It is quite expensive if it is called many
times. In the current patch, the HiveConf.getVar() and string comparison are
called only once and passed to the for-loop.

I meant I did that before

- Ning

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/223/#review268
---

Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
--

Key: HIVE-1416
URL: https://issues.apache.org/jira/browse/HIVE-1416
Project: Hadoop Hive
Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Fix For: 0.6.0, 0.7.0

Attachments: HIVE-1416.patch

Hive parses the file name generated by tasks to figure out the task ID in
order to generate files for empty buckets. Different hadoop versions and
execution mode have different ways of naming output files by
mappers/reducers. We need to move the parsing code to shims.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-22 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881440#action_12881440
]

HBase Review Board commented on HIVE-1416:
--

Message from: Ning Zhang n.nin...@gmail.com

I didn't that before, but changed the way in the patch later. The reason is
that the getting the value of localMode is HiveConf.getVar is a hash lookup and
and a string comparison. It is quite expensive if it is called many times. In
the current patch, the HiveConf.getVar() and string comparison are called only
once and passed to the for-loop.

- Ning

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/223/#review268
---

Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
--

Key: HIVE-1416
URL: https://issues.apache.org/jira/browse/HIVE-1416
Project: Hadoop Hive
Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Fix For: 0.6.0, 0.7.0

Attachments: HIVE-1416.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1361) table/partition level statistics

[
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881448#action_12881448
]

Ning Zhang commented on HIVE-1361:
--

Some comments from internal design review:
- The ANALYZE TABLE command should be integrated with the data replication
hook. When an existing table/partition is analyzed, a new WriteEntity should be
generated to make metadata replication work.
- Investigate JDO on top of HBase integration. If JDO works on HBase, we could
just use JDO to update column stats as well.
- ANALYZE TABLE partition (partition_spec) should support
dynamic-partition-style partition specification. This means the if there are
2 partition columns ds, hr, we can do analyze table partition(ds =
'2010-06-01', hr) to analyze all hr sub-partitions under ds='2010-06-01'.

table/partition level statistics

Key: HIVE-1361
URL: https://issues.apache.org/jira/browse/HIVE-1361
Project: Hadoop Hive
Issue Type: Sub-task
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ahmed M Aly

At the first step, we gather table-level stats for non-partitioned table and
partition-level stats for partitioned table. Future work could extend the
table level stats to partitioned table as well.
There are 3 major milestones in this subtask:
1) extend the insert statement to gather table/partition level stats
on-the-fly.
2) extend metastore API to support storing and retrieving stats for a
particular table/partition.
3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for
existing tables/partitions.
The proposed stats are:
Partition-level stats:
- number of rows
- total size in bytes
- number of files
- max, min, average row sizes
- max, min, average file sizes
Table-level stats in addition to partition level stats:
- number of partitions

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode


[ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881452#action_12881452
 ] 

John Sichi commented on HIVE-1416:
--

I think a profiler would show it as negligible in that context, but I won't 
argue the point.  Can you fix the other one?


 Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
 --

 Key: HIVE-1416
 URL: https://issues.apache.org/jira/browse/HIVE-1416
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1416.patch


 Hive parses the file name generated by tasks to figure out the task ID in 
 order to generate files for empty buckets. Different hadoop versions and 
 execution mode have different ways of naming  output files by 
 mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode


 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1416:
-

Attachment: HIVE-1416.2.patch

new patch that removes accidental junks in HadoopShims.java. 

 Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
 --

 Key: HIVE-1416
 URL: https://issues.apache.org/jira/browse/HIVE-1416
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1416.2.patch, HIVE-1416.patch


 Hive parses the file name generated by tasks to figure out the task ID in 
 order to generate files for empty buckets. Different hadoop versions and 
 execution mode have different ways of naming  output files by 
 mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881472#action_12881472
 ] 

Arvind Prabhakar commented on HIVE-1176:


Yes - it appears that the change in behavior can be attributed to the 
difference in major versions.

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Status: Patch Available  (was: Open)

New patch with test moved to contrib, and DESCRIBE and EXPLAIN thrown in for 
good measure.


 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1304) add row_sequence UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1304:
-

Attachment: HIVE-1304.3.patch

 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881484#action_12881484
 ] 

Paul Yang commented on HIVE-1176:
-

One last thing, can you include a unit test to verify the fix?

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881486#action_12881486
 ] 

Arvind Prabhakar commented on HIVE-1176:


Sorry - it is not clear to me what unit test should I be writing. Can you give 
an example perhaps?

From my perspective, any test that uses the metastore exercises this change. 
And together, all the tests form an exhaustive layer that ensures that there 
is no regression seeping into the system. Note that this is not a 
functionality change, only a change of underlying libraries. 

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881488#action_12881488
 ] 

Arvind Prabhakar commented on HIVE-1176:


Also, for the specific change to {{HiveMetaStoreClient.java}} - the tests under 
{{metastore}} validate that the new libraries are working correctly. 

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation


 [ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1394:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Ning

 do not update transient_lastDdlTime if the partition is modified by a 
 housekeeping operation
 

 Key: HIVE-1394
 URL: https://issues.apache.org/jira/browse/HIVE-1394
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1394.2.patch, HIVE-1394.patch


 Currently. purging looks at the hdfs time to see the last time the files got 
 modified.
 It should look at the metastore property instead - these are facebook 
 specific utilities, which do not require any changes to hive.
 However, in some cases, the operation might be performed by some housekeeping 
 job, which should not modify the timestamp.
 Since, hive has no way of knowing the origin of the query, it might be a good 
 idea to add a new hint which specifies that the 
 operation is a cleanup operation, and the timestamp in the metastore need not 
 be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881497#action_12881497
 ] 

Paul Yang commented on HIVE-1176:
-

Oh, but I thought the original problem (as per title) was an exception with 
'create table if not exists tmp_select(s string, c string, n int)'?

So maybe something like:

CREATE TABLE IF NOT EXISTS tmp_select(s STRING, c STRING, n INT);
DROP TABLE tmp_select;


 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881499#action_12881499
 ] 

Arvind Prabhakar commented on HIVE-1176:


Makes sense. Will add a test case and update the patch soon. Sorry for the 
misunderstanding.

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1304) add row_sequence UDF


[ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881501#action_12881501
 ] 

Namit Jain commented on HIVE-1304:
--

+1

will commit if the tests pass

 add row_sequence UDF
 

 Key: HIVE-1304
 URL: https://issues.apache.org/jira/browse/HIVE-1304
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch


 This is a poor man's answer to the standard analytic function row_number(); 
 it assigns a sequence of numbers to rows, starting from 1.
 I'm calling it row_sequence() to distinguish it from the real analytic 
 function, so that once we add support for those, there won't be any conflict 
 with the existing UDF.
 The problem with this UDF approach is that there are no guarantees about 
 ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thirft metastore

ALTER TABLE ADD PARTITION fails with a remote Thirft metastore
--

 Key: HIVE-1428
 URL: https://issues.apache.org/jira/browse/HIVE-1428
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.6.0, 0.7.0
Reporter: Paul Yang


If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
PARTITION commands will fail with an error similar to the following:

{code}
[prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e ALTER TABLE 
mytable add partition(datestamp = '20091101', srcid = '10',action) location 
'/user/pradeepk/mytable/20091101/10';
10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in 
the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
core-default.xml, mapred-default.xml and hdfs-default.xml respectively
Hive history 
file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
get_partition failed: unknown result
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
[prade...@chargesize:~/dev/howl]
{code}

This is due to a check that tries to retrieve the partition to check if it 
exists. If it does not, an attempt is made to pass a null partition value from 
the metastore. Since thrift does not support null return values, an exception 
is thrown when the CLI is configured to use a remote metastore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-3.patch

 'create if not exists' fails for a table name with 'select' in it
 -

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
 HIVE-1176.lib-files.tar.gz, HIVE-1176.patch


 hive create table if not exists tmp_select(s string, c string, n int);
 org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
 exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
 start with SELECT)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
 JDOQL Single-String query should always start with SELECT)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
 ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it