[jira] Updated: (HIVE-1417) Archived partitions throw error with queries calling getContentSummary
[ https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1417: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed committed. Thanks Paul Archived partitions throw error with queries calling getContentSummary -- Key: HIVE-1417 URL: https://issues.apache.org/jira/browse/HIVE-1417 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch Assuming you have a src table with a ds='1' partition that is archived in HDFS, the following query will throw an exception {code} select count(1) from src where ds='1' group by key; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-535) Memory-efficient hash-based Aggregation
[ https://issues.apache.org/jira/browse/HIVE-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881121#action_12881121 ] He Yongqiang commented on HIVE-535: --- Has anyone tried google sparsehash http://code.google.com/p/google-sparsehash/ ? It's BSD license. But it seems it is in C, and no java version. Memory-efficient hash-based Aggregation --- Key: HIVE-535 URL: https://issues.apache.org/jira/browse/HIVE-535 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Currently there are a lot of memory overhead in the hash-based aggregation in GroupByOperator. The net result is that GroupByOperator won't be able to store many entries in its HashTable, and flushes frequently, and won't be able to achieve very good partial aggregation result. Here are some initial thoughts (some of them are from Joydeep long time ago): A1. Serialize the key of the HashTable. This will eliminate the 16-byte per-object overhead of Java in keys (depending on how many objects there are in the key, the saving can be substantial). A2. Use more memory-efficient hash tables - java.util.HashMap has about 64 bytes of overhead per entry. A3. Use primitive array to store aggregation results. Basically, the UDAF should manage the array of aggregation results, so UDAFCount should manage a long[], UDAFAvg should manage a double[] and a long[]. The external code should pass an index to iterate/merge/terminal an aggregation result. This will eliminate the 16-byte per-object overhead of Java. More ideas are welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name
[ https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Xu updated HIVE-1342: - Attachment: ppd_same_alias_1.patch I think PPD is unnecessarily resolving table aliases when encountered CommonJoinOperator. I attached a patch fixing it. Please have a review. Predicate push down get error result when sub-queries have the same alias name --- Key: HIVE-1342 URL: https://issues.apache.org/jira/browse/HIVE-1342 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.2, 0.5.0 Reporter: Ted Xu Priority: Critical Attachments: cmd.hql, explain, ppd_same_alias_1.patch Query is over-optimized by PPD when sub-queries have the same alias name, see the query: --- create table if not exists dm_fact_buyer_prd_info_d ( category_id string ,gmv_trade_num int ,user_idint ) PARTITIONED BY (ds int); set hive.optimize.ppd=true; set hive.map.aggr=true; explain select category_id1,category_id2,assoc_idx from ( select category_id1 , category_id2 , count(distinct user_id) as assoc_idx from ( select t1.category_id as category_id1 , t2.category_id as category_id2 , t1.user_id from ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t1 join ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t2 on t1.user_id=t2.user_id ) t1 group by category_id1, category_id2 ) t_o where category_id1 category_id2 and assoc_idx 2; - The query above will fail when execute, throwing exception: can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text). I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate): --- Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t_o:t1:t1:dm_fact_buyer_prd_info_d TableScan alias: dm_fact_buyer_prd_info_d Filter Operator predicate: expr: *(category_id user_id)* type: boolean Select Operator expressions: expr: category_id type: string expr: user_id type: bigint outputColumnNames: category_id, user_id Group By Operator keys: expr: category_id type: string expr: user_id type: bigint mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: bigint tag: -1 Reduce Operator Tree: Group By Operator keys: expr: KEY._col0 type: string expr: KEY._col1 type: bigint mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat -- If disabling predicate push down (set hive.optimize.ppd=true), the error is gone; I tried
Hudson build is back to normal : Hive-trunk-h0.18 #480
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/480/
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881255#action_12881255 ] Namit Jain commented on HIVE-1359: -- Requirements 2 and 3 are not addressed in the above patch - talked to Ning offline, and we can do them in a follow-up. Filed a new jira for the same https://issues.apache.org/jira/browse/HIVE-1424 Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1359.patch, unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1424) Unit tests should be shim aware
Unit tests should be shim aware --- Key: HIVE-1424 URL: https://issues.apache.org/jira/browse/HIVE-1424 Project: Hadoop Hive Issue Type: New Feature Components: Testing Infrastructure Reporter: Namit Jain Followup of https://issues.apache.org/jira/browse/HIVE-1359, requirements 2 and 3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881258#action_12881258 ] Namit Jain commented on HIVE-1359: -- This is also needed for https://issues.apache.org/jira/browse/HIVE-1307 in 0.6 We can fix https://issues.apache.org/jira/browse/HIVE-1424 in 0.7, it need not be merged in 0.6 Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1359.patch, unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881274#action_12881274 ] Namit Jain commented on HIVE-1359: -- Looks good to me - John, can you also review ? Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1359.patch, unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881285#action_12881285 ] John Sichi commented on HIVE-1359: -- Since we're not actually dealing with the minimr requirement in this patch, probably better to just leave out those changes completely and we'll address them in HIVE-117. In particular, I don't think the cluster mode should be part of the test code generation; we want it completely dynamic so that we re-run the same test in either mode without regenerating code. Minor nitpicks (these can be fixed in the followup instead of now): * hadoopVersion = new String() is the same as hadoopVersion = * usage of Stack is deprecated since it is based on synchronized Vector Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1359.patch, unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Fix Version/s: 0.7.0 Affects Version/s: 0.6.0 (was: 0.7.0) add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881293#action_12881293 ] Paul Yang commented on HIVE-1176: - For some reason, I don't see the JDO files being deleted after applying the patch: {code} ? build.xml.orig ? HIVE-1176-2.patch ? test.log M eclipse-templates/.classpath M build.properties M build.xml ? metastore/test.log M metastore/ivy.xml M metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ! lib/jdo2-api-2.3-SNAPSHOT.LICENSE ! lib/datanucleus-rdbms-1.1.2.LICENSE ! lib/datanucleus-enhancer-1.1.2.LICENSE ! lib/datanucleus-core-1.1.2.LICENSE M ivy/ivysettings.xml {code} Also, the patch works for branch 0.6 but not for trunk. Can you regenerate it? 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881300#action_12881300 ] Namit Jain commented on HIVE-1422: -- +1 looks good skip counter update when RunningJob.getCounters() returns null -- Key: HIVE-1422 URL: https://issues.apache.org/jira/browse/HIVE-1422 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1422.1.patch Under heavy load circumstances on some Hadoop versions, we may get a NPE from trying to dereference a null Counters object. I don't have a unit test which can reproduce it, but here's an example stack from a production cluster we saw today: 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) at org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881306#action_12881306 ] Ashish Thusoo commented on HIVE-1271: - I am looking at this. Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881319#action_12881319 ] Ashish Thusoo commented on HIVE-1271: - Looks good to me. However, why remove the check on Category? Also why drop the default implementation of the equals method for TypeInfo? Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Status: Open (was: Patch Available) add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Attachment: HIVE-1304.2.patch add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1394: - Attachment: HIVE-1394.patch Adding a new hint to avoid updating transient_lastDdlTime for both table and partitons in metastore. do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Paul Yang Attachments: HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1394: Assignee: Ning Zhang (was: Paul Yang) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1394: - Status: Patch Available (was: Open) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal : Hive-trunk-h0.20 #302
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/302/changes
[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)
[ https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881361#action_12881361 ] John Sichi commented on HIVE-1364: -- Currently the view scripts are only in the wiki: http://wiki.apache.org/hadoop/Hive/ViewDev#Metastore_Upgrades Per discussion with Ashish, we should open a separate JIRA issue for (at a minimum) packaging up example MySQL migration scripts (cumulative across all schema changes from 0.5 to 0.6) and explaining what to do with them in the release notes. Carl, do you want to take that on as part of release mgmt? Increase the maximum length of SERDEPROPERTIES values (currently 767 characters) Key: HIVE-1364 URL: https://issues.apache.org/jira/browse/HIVE-1364 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0 Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch The value component of a SERDEPROPERTIES key/value pair is currently limited to a maximum length of 767 characters. I believe that the motivation for limiting the length to 767 characters is that this value is the maximum allowed length of an index in a MySQL database running on the InnoDB engine: http://bugs.mysql.com/bug.php?id=13315 * The Metastore OR mapping currently limits many fields (including SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite the fact that these fields are not indexed. * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535. * We can expect many users to hit the 767 character limit on SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping serdeproperty to map a table that has many columns. I propose increasing the maximum allowed length of SERDEPROPERTIES.PARAM_VALUE to 8192. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1422: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks John skip counter update when RunningJob.getCounters() returns null -- Key: HIVE-1422 URL: https://issues.apache.org/jira/browse/HIVE-1422 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1422.1.patch Under heavy load circumstances on some Hadoop versions, we may get a NPE from trying to dereference a null Counters object. I don't have a unit test which can reproduce it, but here's an example stack from a production cluster we saw today: 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) at org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881368#action_12881368 ] Arvind Prabhakar commented on HIVE-1271: @Ashish: Thanks for looking at the patch. bq. why remove the check on Category? I modified all the specialized type infos to be {{final}} - which in turn ensures that if the test on {{instanceof}} succeeds, then they have to be the same category type. Therefore, the check on category was redundant going forward. bq. Also why drop the default implementation of the equals method for TypeInfo? I did this for two main reasons - first that fact that it was implementing the {{equals()}} but not {{hashCode()}} method. This could lead to unexpected behavior when {{TypeInfo}} instances were put in collections. Second, the implementation was modified to make both {{equals()}} and {{hashCode()}} methods to be made abstract in order to force any (new) child classes to make sure that they implement both consistently. Let me know if you would like to tweak this change as necessary. Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881381#action_12881381 ] Arvind Prabhakar commented on HIVE-1176: @Paul: I just tested the patch (HIVE-1176-2.patch) on latest trunk and it seems to apply cleanly. Can you please try again and see if it works? Also, can you post the errors that you are seeing? If necessary, I can break down the patch into single-file units to help with applying it. Just let me know either way. 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1394: - Attachment: HIVE-1394.2.patch Added a check in SemanticAnalyzer to throw an exception when HOLD_DDLTIME is specified in dynamic partition insert or non-existence static partitions. A negative test case is also added. do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1394.2.patch, HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881414#action_12881414 ] Namit Jain commented on HIVE-1394: -- +1 will commit if the tests pass do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1394.2.patch, HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)
[ https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881416#action_12881416 ] Carl Steinbach commented on HIVE-1364: -- bq. Also why do we make everything 4000 bytes - I presume things like ftype will never hit that limit. Currently the ORM is the de facto enforcement mechanism for string length limitations. I think this is a bad approach since 1) users can work around it by manually altering the underlying tables, and 2) the limits are stated in terms of bytes so the actual length restriction in terms of number of characters will depend on the character set of the underlying DB. In light of this I bumped every size limit to 4000 bytes, and also because I did not want to try to predict which property length limit someone would next bump into. I'm willing to revert these limits to their original values. Are there any properties besides ftype which you want me to revert? Should I revert everything except SERDEPROPERTIES.PARAM_VALUE? bq. Also changes to upgrade SQL should also be a part of the patch, no? Where are the scripts for the view change located? I'll update the patch with the necessary scripts. Should these go in bin/ or somewhere under metastore/ ? @John: Yes, I think this falls under the responsibility of the release manager. I will take care of it. I think the current approach of using the ORM as the de facto enforcement mechanism for checking Increase the maximum length of SERDEPROPERTIES values (currently 767 characters) Key: HIVE-1364 URL: https://issues.apache.org/jira/browse/HIVE-1364 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0 Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch The value component of a SERDEPROPERTIES key/value pair is currently limited to a maximum length of 767 characters. I believe that the motivation for limiting the length to 767 characters is that this value is the maximum allowed length of an index in a MySQL database running on the InnoDB engine: http://bugs.mysql.com/bug.php?id=13315 * The Metastore OR mapping currently limits many fields (including SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite the fact that these fields are not indexed. * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535. * We can expect many users to hit the 767 character limit on SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping serdeproperty to map a table that has many columns. I propose increasing the maximum allowed length of SERDEPROPERTIES.PARAM_VALUE to 8192. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)
Provide metastore schema migration scripts (0.5 - 0.6) --- Key: HIVE-1427 URL: https://issues.apache.org/jira/browse/HIVE-1427 Project: Hadoop Hive Issue Type: Task Components: Metastore Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 At a minimum this ticket covers packaging up example MySQL migration scripts (cumulative across all schema changes from 0.5 to 0.6) and explaining what to do with them in the release notes. This is also probably a good point at which to decide and clearly state which Metastore DBs we officially support in production, e.g. do we need to provide migration scripts for Derby? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881426#action_12881426 ] HBase Review Board commented on HIVE-1416: -- Message from: John Sichi jsi...@facebook.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/223/#review268 --- http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java http://review.hbase.org/r/223/#comment1126 Rather than repeating the HiveConf.getVar in several places, it would be cleaner to just pass the configuration down into the Utilities method as the new parameter and have it do the configuration check. http://svn.apache.org/repos/asf/hadoop/hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java http://review.hbase.org/r/223/#comment1124 This anObject insertion looks like it was accidental. - John Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode -- Key: HIVE-1416 URL: https://issues.apache.org/jira/browse/HIVE-1416 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1416.patch Hive parses the file name generated by tasks to figure out the task ID in order to generate files for empty buckets. Different hadoop versions and execution mode have different ways of naming output files by mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)
[ https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1427: - Affects Version/s: (was: 0.5.0) Provide metastore schema migration scripts (0.5 - 0.6) --- Key: HIVE-1427 URL: https://issues.apache.org/jira/browse/HIVE-1427 Project: Hadoop Hive Issue Type: Task Components: Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 At a minimum this ticket covers packaging up example MySQL migration scripts (cumulative across all schema changes from 0.5 to 0.6) and explaining what to do with them in the release notes. This is also probably a good point at which to decide and clearly state which Metastore DBs we officially support in production, e.g. do we need to provide migration scripts for Derby? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881431#action_12881431 ] John Sichi commented on HIVE-1359: -- +1. Will commit when tests pass. Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1359.2.patch, HIVE-1359.patch, unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1229) replace dependencies on HBase deprecated API
[ https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881432#action_12881432 ] Basab Maulik commented on HIVE-1229: ... recreating patch against current trunk. Thanks for the feedback! replace dependencies on HBase deprecated API Key: HIVE-1229 URL: https://issues.apache.org/jira/browse/HIVE-1229 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Basab Maulik Attachments: HIVE-1129.1.patch Some of these dependencies are on the old Hadoop mapred packages; others are HBase-specific. The former have to wait until the rest of Hive moves over to the new Hadoop mapreduce package, but the HBase-specific ones don't have to wait. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Status: Open (was: Patch Available) Oops, need to move test to contrib too. add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881439#action_12881439 ] HBase Review Board commented on HIVE-1416: -- Message from: Ning Zhang n.nin...@gmail.com bq. On 2010-06-22 15:34:57, John Sichi wrote: bq. http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java, line 408 bq. http://review.hbase.org/r/223/diff/1/?file=1551#file1551line408 bq. bq. Rather than repeating the HiveConf.getVar in several places, it would be cleaner to just pass the configuration down into the Utilities method as the new parameter and have it do the configuration check. bq. bq. Ning Zhang wrote: bq. I didn't that before, but changed the way in the patch later. The reason is that the getting the value of localMode is HiveConf.getVar is a hash lookup and and a string comparison. It is quite expensive if it is called many times. In the current patch, the HiveConf.getVar() and string comparison are called only once and passed to the for-loop. I meant I did that before - Ning --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/223/#review268 --- Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode -- Key: HIVE-1416 URL: https://issues.apache.org/jira/browse/HIVE-1416 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1416.patch Hive parses the file name generated by tasks to figure out the task ID in order to generate files for empty buckets. Different hadoop versions and execution mode have different ways of naming output files by mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881440#action_12881440 ] HBase Review Board commented on HIVE-1416: -- Message from: Ning Zhang n.nin...@gmail.com bq. On 2010-06-22 15:34:57, John Sichi wrote: bq. http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java, line 408 bq. http://review.hbase.org/r/223/diff/1/?file=1551#file1551line408 bq. bq. Rather than repeating the HiveConf.getVar in several places, it would be cleaner to just pass the configuration down into the Utilities method as the new parameter and have it do the configuration check. I didn't that before, but changed the way in the patch later. The reason is that the getting the value of localMode is HiveConf.getVar is a hash lookup and and a string comparison. It is quite expensive if it is called many times. In the current patch, the HiveConf.getVar() and string comparison are called only once and passed to the for-loop. - Ning --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/223/#review268 --- Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode -- Key: HIVE-1416 URL: https://issues.apache.org/jira/browse/HIVE-1416 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1416.patch Hive parses the file name generated by tasks to figure out the task ID in order to generate files for empty buckets. Different hadoop versions and execution mode have different ways of naming output files by mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881448#action_12881448 ] Ning Zhang commented on HIVE-1361: -- Some comments from internal design review: - The ANALYZE TABLE command should be integrated with the data replication hook. When an existing table/partition is analyzed, a new WriteEntity should be generated to make metadata replication work. - Investigate JDO on top of HBase integration. If JDO works on HBase, we could just use JDO to update column stats as well. - ANALYZE TABLE partition (partition_spec) should support dynamic-partition-style partition specification. This means the if there are 2 partition columns ds, hr, we can do analyze table partition(ds = '2010-06-01', hr) to analyze all hr sub-partitions under ds='2010-06-01'. table/partition level statistics Key: HIVE-1361 URL: https://issues.apache.org/jira/browse/HIVE-1361 Project: Hadoop Hive Issue Type: Sub-task Affects Versions: 0.6.0 Reporter: Ning Zhang Assignee: Ahmed M Aly At the first step, we gather table-level stats for non-partitioned table and partition-level stats for partitioned table. Future work could extend the table level stats to partitioned table as well. There are 3 major milestones in this subtask: 1) extend the insert statement to gather table/partition level stats on-the-fly. 2) extend metastore API to support storing and retrieving stats for a particular table/partition. 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for existing tables/partitions. The proposed stats are: Partition-level stats: - number of rows - total size in bytes - number of files - max, min, average row sizes - max, min, average file sizes Table-level stats in addition to partition level stats: - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881452#action_12881452 ] John Sichi commented on HIVE-1416: -- I think a profiler would show it as negligible in that context, but I won't argue the point. Can you fix the other one? Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode -- Key: HIVE-1416 URL: https://issues.apache.org/jira/browse/HIVE-1416 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1416.patch Hive parses the file name generated by tasks to figure out the task ID in order to generate files for empty buckets. Different hadoop versions and execution mode have different ways of naming output files by mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1416: - Attachment: HIVE-1416.2.patch new patch that removes accidental junks in HadoopShims.java. Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode -- Key: HIVE-1416 URL: https://issues.apache.org/jira/browse/HIVE-1416 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.6.0, 0.7.0 Attachments: HIVE-1416.2.patch, HIVE-1416.patch Hive parses the file name generated by tasks to figure out the task ID in order to generate files for empty buckets. Different hadoop versions and execution mode have different ways of naming output files by mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881472#action_12881472 ] Arvind Prabhakar commented on HIVE-1176: Yes - it appears that the change in behavior can be attributed to the difference in major versions. 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Status: Patch Available (was: Open) New patch with test moved to contrib, and DESCRIBE and EXPLAIN thrown in for good measure. add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1304: - Attachment: HIVE-1304.3.patch add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881484#action_12881484 ] Paul Yang commented on HIVE-1176: - One last thing, can you include a unit test to verify the fix? 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881486#action_12881486 ] Arvind Prabhakar commented on HIVE-1176: Sorry - it is not clear to me what unit test should I be writing. Can you give an example perhaps? From my perspective, any test that uses the metastore exercises this change. And together, all the tests form an exhaustive layer that ensures that there is no regression seeping into the system. Note that this is not a functionality change, only a change of underlying libraries. 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881488#action_12881488 ] Arvind Prabhakar commented on HIVE-1176: Also, for the specific change to {{HiveMetaStoreClient.java}} - the tests under {{metastore}} validate that the new libraries are working correctly. 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
[ https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1394: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Ning do not update transient_lastDdlTime if the partition is modified by a housekeeping operation Key: HIVE-1394 URL: https://issues.apache.org/jira/browse/HIVE-1394 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Fix For: 0.7.0 Attachments: HIVE-1394.2.patch, HIVE-1394.patch Currently. purging looks at the hdfs time to see the last time the files got modified. It should look at the metastore property instead - these are facebook specific utilities, which do not require any changes to hive. However, in some cases, the operation might be performed by some housekeeping job, which should not modify the timestamp. Since, hive has no way of knowing the origin of the query, it might be a good idea to add a new hint which specifies that the operation is a cleanup operation, and the timestamp in the metastore need not be touched for that scenario. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881497#action_12881497 ] Paul Yang commented on HIVE-1176: - Oh, but I thought the original problem (as per title) was an exception with 'create table if not exists tmp_select(s string, c string, n int)'? So maybe something like: CREATE TABLE IF NOT EXISTS tmp_select(s STRING, c STRING, n INT); DROP TABLE tmp_select; 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881499#action_12881499 ] Arvind Prabhakar commented on HIVE-1176: Makes sense. Will add a test case and update the patch soon. Sorry for the misunderstanding. 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881501#action_12881501 ] Namit Jain commented on HIVE-1304: -- +1 will commit if the tests pass add row_sequence UDF Key: HIVE-1304 URL: https://issues.apache.org/jira/browse/HIVE-1304 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1. I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF. The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thirft metastore
ALTER TABLE ADD PARTITION fails with a remote Thirft metastore -- Key: HIVE-1428 URL: https://issues.apache.org/jira/browse/HIVE-1428 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.6.0, 0.7.0 Reporter: Paul Yang If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD PARTITION commands will fail with an error similar to the following: {code} [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e ALTER TABLE mytable add partition(datestamp = '20091101', srcid = '10',action) location '/user/pradeepk/mytable/20091101/10'; 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt FAILED: Error in metadata: org.apache.thrift.TApplicationException: get_partition failed: unknown result FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask [prade...@chargesize:~/dev/howl] {code} This is due to a check that tries to retrieve the partition to check if it exists. If it does not, an attempt is made to pass a null partition value from the metastore. Since thrift does not support null return values, an exception is thrown when the CLI is configured to use a remote metastore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvind Prabhakar updated HIVE-1176: --- Attachment: HIVE-1176-3.patch 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881538#action_12881538 ] Arvind Prabhakar commented on HIVE-1176: Updated patch with a test case attached. Please use HIVE-1176-3.patch. The changed files in this patch are as follows: # modified: build.properties # modified: build.xml # new file: data/files/simple.txt # modified: eclipse-templates/.classpath # modified: ivy/ivysettings.xml # deleted:lib/datanucleus-core-1.1.2.LICENSE # deleted:lib/datanucleus-core-1.1.2.jar # deleted:lib/datanucleus-enhancer-1.1.2.LICENSE # deleted:lib/datanucleus-enhancer-1.1.2.jar # deleted:lib/datanucleus-rdbms-1.1.2.LICENSE # deleted:lib/datanucleus-rdbms-1.1.2.jar # deleted:lib/jdo2-api-2.3-SNAPSHOT.LICENSE # deleted:lib/jdo2-api-2.3-SNAPSHOT.jar # modified: metastore/ivy.xml # modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java # new file: ql/src/test/queries/clientpositive/hive_1176.q # new file: ql/src/test/results/clientpositive/hive_1176.q.out 'create if not exists' fails for a table name with 'select' in it - Key: HIVE-1176 URL: https://issues.apache.org/jira/browse/HIVE-1176 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Prasad Chakka Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch hive create table if not exists tmp_select(s string, c string, n int); org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.