[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.3.patch HIVE-1307.3_java.patch Uploading HIVE-1307.3.patch and HIVE-1307.3_java.patch (java changes only). This patch fixes a bug in dynamic partition insert (adding partition column property in GenMRFileSink1.java). Also added one unit test case merge4.q for this case. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1505) Support non-UTF8 data
[ https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Xu updated HIVE-1505: - Attachment: trunk-encoding.patch We implemented encoding config feature on tables. Set table encoding through serde parameter, for example: {code} alter table src set serdeproperties ('serialization.encoding'='GBK'); {code} that makes table src using GBK encoding (Chinese encoding format). Further more, if using command line interface, parameter 'hive.cli.encoding' shall be set. 'hive.cli.encoding' must set before hive prompt started, so set 'hive.cli.encoding' in hive-site.xml or using -hiveconf hive.cli.encoding=GBK in command line parameter, instead of 'set hive.cli.encoding=GBK' in hive ql. Because of the reason above, I can't find a way to add a unit test. > Support non-UTF8 data > - > > Key: HIVE-1505 > URL: https://issues.apache.org/jira/browse/HIVE-1505 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Affects Versions: 0.5.0 >Reporter: bc Wong > Attachments: trunk-encoding.patch > > > I'd like to work with non-UTF8 data easily. > Suppose I have data in latin1. Currently, doing a "select *" will return the > upper ascii characters in '\xef\xbf\xbd', which is the replacement character > '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different > encodings, or to have a concept of byte string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900178#action_12900178 ] Amareshwari Sriramadasu commented on HIVE-1561: --- When I tried SMB join on local machine (pseudo distributed mode) I'm seeing wrong results for the join. I think if there are more than one mapper, the join logic does not work correctly. Here is my run: {noformat} hive> describe extended smb_input; OK key int value int Detailed Table Information Table(tableName:smb_input, dbName:default, owner:amarsri, createTime:1282026968, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:int, comment:null)], location:hdfs://localhost:19000/user/hive/warehouse/smb_input, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[key], sortCols:[Order(col:key, order:1)], parameters:{}), partitionKeys:[], parameters:{SORTBUCKETCOLSPREFIX=TRUE, transient_lastDdlTime=1282027032}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.05 seconds hive> select * from smb_input; OK 12 35 48 40 100 100 Time taken: 0.343 seconds hive> set hive.optimize.bucketmapjoin = true; hive> set hive.optimize.bucketmapjoin.sortedmerge = true; hive> select /*+ MAPJOIN(a) */ * from smb_input a join smb_input b on a.key=b.key; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201008031340_0170, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201008031340_0170 Kill Command = /home/amarsri/workspace/Yahoo20/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:19101 -kill job_201008031340_0170 2010-08-19 11:04:00,040 Stage-1 map = 0%, reduce = 0% 2010-08-19 11:04:10,253 Stage-1 map = 50%, reduce = 0% 2010-08-19 11:04:13,271 Stage-1 map = 100%, reduce = 0% 2010-08-19 11:05:13,636 Stage-1 map = 100%, reduce = 0% 2010-08-19 11:05:19,664 Stage-1 map = 50%, reduce = 0% 2010-08-19 11:05:25,733 Stage-1 map = 100%, reduce = 0% 2010-08-19 11:05:28,762 Stage-1 map = 100%, reduce = 100% Ended Job = job_201008031340_0170 OK 12 35 12 35 48 40 48 40 Time taken: 100.056 seconds Expected output: 12 35 12 35 48 40 48 40 100 100 100 100 {noformat} The MapReduce Job launched for the join has 2 maps. Second map's first attempt (attempt_201008031340_0170_m_01_0) fails with following expetion: {noformat} 2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: replace taskId from execContext 2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 00_0 2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/00_0 2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.00_0 2010-08-19 11:04:07,196 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/00_0 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 finished. closing... 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 forwarded 5 rows 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 Close done 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 forwarded 1 rows 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1 rows 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows 2010-08-19 11:05:08,656 ERROR ExecMapper: Hit error while closing operators - failing tree 2010-08-19 11:05:08,658 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:253) at org.apache.hado
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900176#action_12900176 ] Namit Jain commented on HIVE-1561: -- My bad, I did not see the entire results - so, based on what Joy is saying, it does not work in minimr mode > smb_mapjoin_8.q returns different results in miniMr mode > > > Key: HIVE-1561 > URL: https://issues.apache.org/jira/browse/HIVE-1561 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: He Yongqiang > > follow on to HIVE-1523: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test > POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer > join smb_bucket4_2 b on a.key = b.key > official results: > 4 val_356 NULL NULL > NULL NULL 484 val_169 > 2000 val_169 NULL NULL > NULL NULL 3000 val_169 > 4000 val_125 NULL NULL > in minimr mode: > 2000 val_169 NULL NULL > 4 val_356 NULL NULL > 2000 val_169 NULL NULL > 4000 val_125 NULL NULL > NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Attachment: patch-741-2.txt Patch fixes SMBMapJoinOperator also. I modified compareKeys(ArrayList k1, ArrayList k2) to do the following: {code} if (hasNullElements(k1) && hasNullElements(k2)) { return -1; // just return k1 is smaller than k2 } else if (hasNullElements(k1)) { return (0 - k2.size()); } else if (hasNullElements(k2)) { return k1.size(); } ... //the existing code. {code} Does the above make sense? Updated the testcase with smb join queries. When I'm running smb join on my local machine (pseudo distributed mode), I'm getting different results. I think that is mostly because of HIVE-1561. Will update the issue with my findings. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741.txt, > smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900161#action_12900161 ] He Yongqiang commented on HIVE-1561: This is the complete result from Hive's smb_mapjoin_8.q.out, it's correct: {noformat} POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key POSTHOOK: type: QUERY POSTHOOK: Input: defa...@smb_bucket4_2 POSTHOOK: Input: defa...@smb_bucket4_1 POSTHOOK: Output: file:/tmp/jssarma/hive_2010-07-21_12-02-34_137_8141051139723931378/1 POSTHOOK: Lineage: smb_bucket4_1.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ] POSTHOOK: Lineage: smb_bucket4_1.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ] POSTHOOK: Lineage: smb_bucket4_2.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ] POSTHOOK: Lineage: smb_bucket4_2.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ] 4 val_356 NULLNULL NULLNULL484 val_169 2000val_169 NULLNULL NULLNULL3000val_169 4000val_125 NULLNULL NULLNULL5000val_125 {noformat} > smb_mapjoin_8.q returns different results in miniMr mode > > > Key: HIVE-1561 > URL: https://issues.apache.org/jira/browse/HIVE-1561 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: He Yongqiang > > follow on to HIVE-1523: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test > POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer > join smb_bucket4_2 b on a.key = b.key > official results: > 4 val_356 NULL NULL > NULL NULL 484 val_169 > 2000 val_169 NULL NULL > NULL NULL 3000 val_169 > 4000 val_125 NULL NULL > in minimr mode: > 2000 val_169 NULL NULL > 4 val_356 NULL NULL > 2000 val_169 NULL NULL > 4000 val_125 NULL NULL > NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.3.patch small change - fix 0.20 version match to pick the right jetty version. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900141#action_12900141 ] John Sichi commented on HIVE-1523: -- Yeah, shortreg/longreg split would be good. The challenge is to keep longreg healthy since breakages don't get caught with every checkin, so we'll need (a) automation to run it constantly and report failures (b) people to actually fix failures in a timely fashion > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1563) HBase tests broken
[ https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1563: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks John -- running all the tests now to see if we need more log file updates > HBase tests broken > -- > > Key: HIVE-1563 > URL: https://issues.apache.org/jira/browse/HIVE-1563 > Project: Hadoop Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1563.1.patch > > > Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1546: Assignee: Ashutosh Chauhan > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: hive-1546.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1546: - Fix Version/s: 0.7.0 Affects Version/s: 0.7.0 > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900138#action_12900138 ] Namit Jain commented on HIVE-1561: -- Looked at the data in detail: The tables should be: smb_bucket4_1 4 v356 2000 v169 4000 v125 smb_bucket4_2 484 v169 3000v169 5000v125 So, the above query should result in 6 rows - both the results are wrong > smb_mapjoin_8.q returns different results in miniMr mode > > > Key: HIVE-1561 > URL: https://issues.apache.org/jira/browse/HIVE-1561 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: He Yongqiang > > follow on to HIVE-1523: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test > POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer > join smb_bucket4_2 b on a.key = b.key > official results: > 4 val_356 NULL NULL > NULL NULL 484 val_169 > 2000 val_169 NULL NULL > NULL NULL 3000 val_169 > 4000 val_125 NULL NULL > in minimr mode: > 2000 val_169 NULL NULL > 4 val_356 NULL NULL > 2000 val_169 NULL NULL > 4000 val_125 NULL NULL > NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900137#action_12900137 ] Ashutosh Chauhan commented on HIVE-1546: Btw, can someone assign this jira to me and add me to the list of contributors so that in future I can do that myself. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Reporter: Ashutosh Chauhan > Attachments: hive-1546.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1546: --- Attachment: hive-1546.patch Attached patch adds the capability to Hive so that custom semantic analysis of query is possible before it is handed over to Hive. Plus there are few other miscellaneous refactoring around it. Changes include: * Addition of SemanticAnalyzerFactoryInterface. If conf has a particular variable specified, a custom analyzer will be loaded and used, otherwise existing Hive Semantic Analyzer will be used. So, default behavior is preserved. * Changed visibility of few methods in DDLSemanticAnalyzer and SemanticAnalyzer from private to protected as I wanted to override them in my custom analyzer. * Changed file format specification in grammar, so that it can optionally take two more parameters (InputDriver and OutputDriver) in addition to InputFormat and OutputFormat. These are optional, so preserves the default behavior. * In file format specification, currently SequenceFile, TextFile and RCFile are supported through keyword. Expanded that production so to accept an identifier so that its possible to provide support for more file formats without needing to change Hive grammar every time. Currently, that token results in exception since there are none, but when we add support for other file formats that could be changed. This preserves current behavior. Note that there are no new test cases since its mostly code restructuring and doesnt add/modify current behavior, thus passing existing tests should suffice. I should point out most of these changes are driven by Howl and would like to thank John for suggesting the initial approach for these changes. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Reporter: Ashutosh Chauhan > Attachments: hive-1546.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1563) HBase tests broken
[ https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900134#action_12900134 ] Namit Jain commented on HIVE-1563: -- running tests > HBase tests broken > -- > > Key: HIVE-1563 > URL: https://issues.apache.org/jira/browse/HIVE-1563 > Project: Hadoop Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1563.1.patch > > > Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1563) HBase tests broken
[ https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1563: - Attachment: HIVE-1563.1.patch > HBase tests broken > -- > > Key: HIVE-1563 > URL: https://issues.apache.org/jira/browse/HIVE-1563 > Project: Hadoop Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1563.1.patch > > > Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1563) HBase tests broken
[ https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1563: - Status: Patch Available (was: Open) > HBase tests broken > -- > > Key: HIVE-1563 > URL: https://issues.apache.org/jira/browse/HIVE-1563 > Project: Hadoop Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1563.1.patch > > > Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1563) HBase tests broken
HBase tests broken -- Key: HIVE-1563 URL: https://issues.apache.org/jira/browse/HIVE-1563 Project: Hadoop Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.7.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1563) HBase tests broken
[ https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1563: - Hadoop Flags: (was: [Reviewed]) > HBase tests broken > -- > > Key: HIVE-1563 > URL: https://issues.apache.org/jira/browse/HIVE-1563 > Project: Hadoop Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > > Broken by HIVE-1548, which did not update all log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: alarming hive test failures in minimr mode
Joy, can you link the jiras from 1523 ? It will be easier to track that way. Thanks, -namit From: Joydeep Sen Sarma [jssa...@facebook.com] Sent: Wednesday, August 18, 2010 2:59 PM To: hive-dev@hadoop.apache.org Subject: alarming hive test failures in minimr mode Hey Devs, Since fixing 1523 - I have been trying to run hive queries in minimr mode. I am alarmed by what I am seeing: - assertions firing deep inside hive in minimr mode - test results outright different from local mode results (and not because of limit or because of ordering). I am going to file jiras as I can - please do assign them to yourself (or whoever u think the right person is). I think we should try to get these resolved asap - they seem to indicate significant bugs in features advertised by Hive (that do not get enough coverage on real clusters). Imho - this seems way more important than new feature dev. Thanks, Joydeep
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900128#action_12900128 ] Namit Jain commented on HIVE-1293: -- Fixed a lot of bugs, added a lot of comments, tested it with a zooKeeper cluster of 3 nodes. select * currently performs a dirty read, we can add a new parameter to change that behavior if need be. > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1293: - Status: Patch Available (was: Open) > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1293: - Attachment: hive.1293.6.patch > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900113#action_12900113 ] Ning Zhang commented on HIVE-1510: -- Other than the clean architecture concerns (Driver should be generic and should not assume tasks contain MR jobs), it seems also doesn't work if parallel execution is enabled: IOPrepareCache is thread local and parallel MR jobs are launched in different threads. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1556: - Affects Version/s: 0.7.0 > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900102#action_12900102 ] John Sichi commented on HIVE-1556: -- I'll regenerate the test output and post a separate JIRA. > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900098#action_12900098 ] John Sichi commented on HIVE-1549: -- Will commit when tests pass. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900097#action_12900097 ] John Sichi commented on HIVE-1556: -- HBaseHandler does run with ant test on Hadoop 0.20, but not with 0.17, so it's important to run tests against both configurations. > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.2.patch with modified list of minimr tests: + i took the ones that worked from John's list. also added a couple of tests that had 'add jar' and 'add file' commands (since their interaction with real cluster is quite different). > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900094#action_12900094 ] Ning Zhang commented on HIVE-1556: -- Could also because of HIVE-1548. Is HBaseHandler also excluded from ant test? > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1555: - Affects Version/s: (was: 0.5.0) Component/s: Drivers > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hadoop Hive > Issue Type: New Feature > Components: Drivers >Reporter: Bob Robertson > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1558) introducing the "dual" table
[ https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1558: - Component/s: Query Processor > introducing the "dual" table > > > Key: HIVE-1558 > URL: https://issues.apache.org/jira/browse/HIVE-1558 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Ning Zhang > > The "dual" table in MySQL and Oracle is very convenient in testing UDFs or > constructing rows without reading any other tables. > If dual is the only data source we could leverage the local mode execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams
[ https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1518: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Mayank! > context_ngrams() UDAF for estimating top-k contextual n-grams > - > > Key: HIVE-1518 > URL: https://issues.apache.org/jira/browse/HIVE-1518 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, > HIVE-1518.4.patch, HIVE-1518.5.patch > > > Create a new context_ngrams() function that generalizes the ngrams() UDAF to > allow the user to specify context around n-grams. The analogy is > "fill-in-the-blanks", and is best illustrated with an example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM > twitter; > will estimate the top-300 words that follow the phrase "i love" in a database > of tweets. The position of the null(s) specifies where to generate the n-gram > from, and can be placed anywhere. For example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", > "hate", null), 300) FROM twitter; > will estimate the top-300 word-pairs that fill in the blanks specified by > null. > POSSIBLE USES: > 1. Pre-computing search lookaheads > 2. Sentiment analysis for products or entities -- e.g., querying with context > = array("twitter", "is", null) > 3. Navigation path analysis in URL databases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900086#action_12900086 ] John Sichi commented on HIVE-1556: -- I got a failure running ant test just now for the HBase portion. [junit] diff -a -I file: -I pfile: -I /tmp/ -I invalidscheme: -I lastUpdate\ Time -I lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeEx\ ception -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [\ 0-9]* more /data/users/jsichi/open/commit-trunk/build/hbase-handler/test/logs/h\ base-handler/hbase_bulk.m.out /data/users/jsichi/open/commit-trunk/hbase-handle\ r/src/test/results/hbase_bulk.m.out [junit] 109,110d108 [junit] < PREHOOK: Input: defa...@hbsort [junit] < PREHOOK: Output: defa...@hbsort [junit] 118d115 [junit] < POSTHOOK: Input: defa...@hbsort [junit] 126,127d122 [junit] < PREHOOK: Input: defa...@hbpartition [junit] < PREHOOK: Output: defa...@hbpartition [junit] 130d124 [junit] < POSTHOOK: Input: defa...@hbpartition [junit] Exception: Client execution results failed with error code = 1 [junit] junit.framework.AssertionFailedError: Client execution results fail\ ed with error code = 1 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliD\ river_hbase_bulk(TestHBaseMinimrCliDriver.java:102) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce\ ssorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe\ thodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900075#action_12900075 ] Mayank Lahiri commented on HIVE-1549: - +1 looks good to me. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900074#action_12900074 ] He Yongqiang commented on HIVE-1510: About the additional hashmap added, it is used to match path to partitionDesc by discarding partitionDesc's schema information. In the long run, we should normalize all input path to let them contain full schema and authorization information. This is a must to let hive work with multiple hdfs clusters. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1562) sample10.q fails in minimr mode
sample10.q fails in minimr mode --- Key: HIVE-1562 URL: https://issues.apache.org/jira/browse/HIVE-1562 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2010-08-18 15:13:56,566 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with errors 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedT\ ask -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Add TIMESTAMP column type
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Attachment: Hive-192.patch.txt This is just the changes for thrift layer to see only string types being passed back and forth for Timestamp type. > Add TIMESTAMP column type > - > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Johan Oskarsson >Assignee: Shyam Sundar Sarkar > Attachments: create_2.q.txt, Hive-192.patch.txt, > TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Add TIMESTAMP column type
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Status: Patch Available (was: In Progress) This is just the initial changes for others to look at and suggest. I need suggestions about string to Timestamp conversion within Dynamic SerDe layer. > Add TIMESTAMP column type > - > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Johan Oskarsson >Assignee: Shyam Sundar Sarkar > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang reassigned HIVE-1561: -- Assignee: He Yongqiang > smb_mapjoin_8.q returns different results in miniMr mode > > > Key: HIVE-1561 > URL: https://issues.apache.org/jira/browse/HIVE-1561 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: He Yongqiang > > follow on to HIVE-1523: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test > POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer > join smb_bucket4_2 b on a.key = b.key > official results: > 4 val_356 NULL NULL > NULL NULL 484 val_169 > 2000 val_169 NULL NULL > NULL NULL 3000 val_169 > 4000 val_125 NULL NULL > in minimr mode: > 2000 val_169 NULL NULL > 4 val_356 NULL NULL > 2000 val_169 NULL NULL > 4000 val_125 NULL NULL > NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900063#action_12900063 ] He Yongqiang commented on HIVE-1510: >>the IOPrepareCache is cleared in Driver, which should only contain generic >>code irrespect to task types. Can you do it in ExecDriver.execute()? This >>will new cache is only used in ExecDriver anyways. ExecDriver is per map-reduce task. Driver is per query. We should do this for query granularity. I think the pathToPartitionDesc is also per query map? >>some comments on why you need a new hash map keyed with the paths only will >>be helpful. will do it in a next patch. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
smb_mapjoin_8.q returns different results in miniMr mode Key: HIVE-1561 URL: https://issues.apache.org/jira/browse/HIVE-1561 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma follow on to HIVE-1523: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key official results: 4 val_356 NULL NULL NULL NULL 484 val_169 2000 val_169 NULL NULL NULL NULL 3000 val_169 4000 val_125 NULL NULL in minimr mode: 2000 val_169 NULL NULL 4 val_356 NULL NULL 2000 val_169 NULL NULL 4000 val_125 NULL NULL NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900062#action_12900062 ] Pierre Huyn commented on HIVE-1549: --- Thanks for your comments. The items have been taken care of in the patch #2. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
alarming hive test failures in minimr mode
Hey Devs, Since fixing 1523 - I have been trying to run hive queries in minimr mode. I am alarmed by what I am seeing: - assertions firing deep inside hive in minimr mode - test results outright different from local mode results (and not because of limit or because of ordering). I am going to file jiras as I can - please do assign them to yourself (or whoever u think the right person is). I think we should try to get these resolved asap - they seem to indicate significant bugs in features advertised by Hive (that do not get enough coverage on real clusters). Imho - this seems way more important than new feature dev. Thanks, Joydeep
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900057#action_12900057 ] Ning Zhang commented on HIVE-1510: -- As discussed offline with Yongqiang, we should clean up the pathToPartitionInfo to contain only canonical representations for each partition. This could result in much cleaner code. If we do that IOPrepareCache is not needed at all and the function getPartitionDescFromPath is just simple hash lookup. We can make it as a follow up JIRA along with cleaning up the unnecessary info in pathToPartitionInfo as well. Here's some comments on the current patch: - the IOPrepareCache is cleared in Driver, which should only contain generic code irrespect to task types. Can you do it in ExecDriver.execute()? This will new cache is only used in ExecDriver anyways. - some comments on why you need a new hash map keyed with the paths only will be helpful. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Huyn updated HIVE-1549: -- Attachment: HIVE-1549.2.patch Fixed the 2 issues from Mayank's review. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900048#action_12900048 ] Joydeep Sen Sarma commented on HIVE-1523: - i am running through the above qfiles and see what executes successfully on minimr (because many dont). one concern is the length of the tests. i think we need to divide our tests into a short and long regression. otherwise development cycle is severely impacted if everything has to be tested on every iteration. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1560) binaryoutputformat.q failure in minimr mode
binaryoutputformat.q failure in minimr mode --- Key: HIVE-1560 URL: https://issues.apache.org/jira/browse/HIVE-1560 Project: Hadoop Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Joydeep Sen Sarma this is a followup to HIVE-1523. ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=binary_output_format.q test fails in a significant manner. all the rows are flattened out into one row: ntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* more /data/users/jssarma/hive_testing/build/ql/test/logs/clientposit\ ive/binary_output_format.q.out /data/users/jssarma/hive_testing/ql/src/test/results/clientpositive/binary_output_format.q.out [junit] 313c313,812 [junit] < 238 val_23886 val_86311 val_31127 val_27165 val_165409 ... [junit] --- [junit] > 238 val_238 [junit] > 86 val_86 [junit] > 311 val_311 [junit] > 27 val_27 [junit] > 165 val_165 ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900039#action_12900039 ] Edward Capriolo commented on HIVE-1555: --- I wonder if this could end up being a very effective way to query shared data stores. I think I saw something like this in futurama.. Dont worry about querying blank, let me worry about querying blank. http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBcQFjAB&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DB5cAwTEEGNE&ei=Qk9sTLAThIqXB__DzDw&usg=AFQjCNH_TOUS1cl6t0gZXefRURw0a_feZg > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Bob Robertson > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900029#action_12900029 ] Mayank Lahiri commented on HIVE-1549: - Nice job Pierre! Just a couple of very trivial points: -- UDAF file, line #116 and line #123, could you amend the error message to indicate that only numeric types are accepted (string is also included as of now). -- I don't think you need the private boolean warned, line #273 Otherwise, it looks good and the numbers work out. Incidentally, for the future, if your UDAF only stores a small number of values as a partial aggregation, you might just want to consider serializing the values as a list of doubles instead of a struct in terminatePartial() and merge(). It'll probably save you some time and reduce the amount of code in those parts. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900018#action_12900018 ] John Sichi edited comment on HIVE-1523 at 8/18/10 4:30 PM: --- +1. Last time I talked to Ning about this, my take was that we should be able to re-run any subset of tests in either mode (without needing test codegen for it), but for now we can just get things working again this way. Some candidates for existing tests to add in to the minimr suite: jsichi-mac:clientpositive jsichi$ grep reducer *.q bucket1.q:set hive.exec.reducers.max = 200; bucket2.q:set hive.exec.reducers.max = 1; bucket3.q:set hive.exec.reducers.max = 1; bucket4.q:set hive.exec.reducers.max = 1; bucketmapjoin6.q:set hive.exec.reducers.max=1; disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1; reduce_deduplicate.q:set hive.exec.reducers.max = 1; sample10.q:set hive.exec.reducers.max=4; smb_mapjoin_6.q:set hive.exec.reducers.max = 1; smb_mapjoin_7.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; udaf_percentile_approx.q:set hive.exec.reducers.max=4 was (Author: jvs): +1. Last time I talked to Ning about this, my take was that we should be able to re-run any subset of tests in either mode (without needing test codegen for it), but for now we can just get things working again this way. Some candidates for existing tests to adding to the minimr suite: jsichi-mac:clientpositive jsichi$ grep reducer *.q bucket1.q:set hive.exec.reducers.max = 200; bucket2.q:set hive.exec.reducers.max = 1; bucket3.q:set hive.exec.reducers.max = 1; bucket4.q:set hive.exec.reducers.max = 1; bucketmapjoin6.q:set hive.exec.reducers.max=1; disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1; reduce_deduplicate.q:set hive.exec.reducers.max = 1; sample10.q:set hive.exec.reducers.max=4; smb_mapjoin_6.q:set hive.exec.reducers.max = 1; smb_mapjoin_7.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; udaf_percentile_approx.q:set hive.exec.reducers.max=4 > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900018#action_12900018 ] John Sichi commented on HIVE-1523: -- +1. Last time I talked to Ning about this, my take was that we should be able to re-run any subset of tests in either mode (without needing test codegen for it), but for now we can just get things working again this way. Some candidates for existing tests to adding to the minimr suite: jsichi-mac:clientpositive jsichi$ grep reducer *.q bucket1.q:set hive.exec.reducers.max = 200; bucket2.q:set hive.exec.reducers.max = 1; bucket3.q:set hive.exec.reducers.max = 1; bucket4.q:set hive.exec.reducers.max = 1; bucketmapjoin6.q:set hive.exec.reducers.max=1; disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1; reduce_deduplicate.q:set hive.exec.reducers.max = 1; sample10.q:set hive.exec.reducers.max=4; smb_mapjoin_6.q:set hive.exec.reducers.max = 1; smb_mapjoin_7.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; smb_mapjoin_8.q:set hive.exec.reducers.max = 1; udaf_percentile_approx.q:set hive.exec.reducers.max=4 > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900011#action_12900011 ] Mayank Lahiri commented on HIVE-1549: - No problem, reviewing it now... > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams
[ https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900012#action_12900012 ] John Sichi commented on HIVE-1518: -- Running through tests now. > context_ngrams() UDAF for estimating top-k contextual n-grams > - > > Key: HIVE-1518 > URL: https://issues.apache.org/jira/browse/HIVE-1518 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, > HIVE-1518.4.patch, HIVE-1518.5.patch > > > Create a new context_ngrams() function that generalizes the ngrams() UDAF to > allow the user to specify context around n-grams. The analogy is > "fill-in-the-blanks", and is best illustrated with an example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM > twitter; > will estimate the top-300 words that follow the phrase "i love" in a database > of tweets. The position of the null(s) specifies where to generate the n-gram > from, and can be placed anywhere. For example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", > "hate", null), 300) FROM twitter; > will estimate the top-300 word-pairs that fill in the blanks specified by > null. > POSSIBLE USES: > 1. Pre-computing search lookaheads > 2. Sentiment analysis for products or entities -- e.g., querying with context > = array("twitter", "is", null) > 3. Navigation path analysis in URL databases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1299#action_1299 ] John Sichi commented on HIVE-1549: -- Mayank, if you get time, here's another one to take a look at. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1291#action_1291 ] John Sichi commented on HIVE-1555: -- For an implementation possibility, see http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Bob Robertson > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1010) Implement INFORMATION_SCHEMA in Hive
[ https://issues.apache.org/jira/browse/HIVE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1288#action_1288 ] John Sichi commented on HIVE-1010: -- See HIVE-1555 for a JDBC storage handler. > Implement INFORMATION_SCHEMA in Hive > > > Key: HIVE-1010 > URL: https://issues.apache.org/jira/browse/HIVE-1010 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor, Server Infrastructure >Reporter: Jeff Hammerbacher > > INFORMATION_SCHEMA is part of the SQL92 standard and would be useful to > implement using our metastore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1558) introducing the "dual" table
[ https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1281#action_1281 ] John Sichi commented on HIVE-1558: -- SQL standard has the VALUES clause, so you can do INSERT INTO t VALUES(3, 'hi'); -- inserts one row INSERT INTO t VALUES (3, 'hi'), (4, 'bye'); -- inserts two rows and SELECT * FROM (VALUES(3, 'hi'), (4, 'bye')) -- inline table results If we add dual, it would also be nice to support at least the standard INSERT syntax since that has been around forever. > introducing the "dual" table > > > Key: HIVE-1558 > URL: https://issues.apache.org/jira/browse/HIVE-1558 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ning Zhang > > The "dual" table in MySQL and Oracle is very convenient in testing UDFs or > constructing rows without reading any other tables. > If dual is the only data source we could leverage the local mode execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1510: --- Attachment: hive-1510.3.patch > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899975#action_12899975 ] Tim Perkins commented on HIVE-1555: --- This sounds great. We would love to be able to easily integrate our existing RDBMS reporting data directly into Hive. Getting everything from one frontend connected to Hive would make things much simpler. > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Bob Robertson > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Status: Patch Available (was: In Progress) > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.2.patch Uploading a new full patch HIVE-1307.2.patch, containing the following additional changes: - more log file changes due to svn up to the latest revision (mostly due to conflict with another patch on lineage hooks). - minor change in FileUtils.java to include '{' and ']' as special characters to escape when they are used as partition column values. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1559) Contrib tests not run as part of 'ant test'
[ https://issues.apache.org/jira/browse/HIVE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899961#action_12899961 ] Ning Zhang commented on HIVE-1559: -- BTW, TestContribParse[Negative], TestContribNegativeCliDriver should also be included in 'ant test'. > Contrib tests not run as part of 'ant test' > --- > > Key: HIVE-1559 > URL: https://issues.apache.org/jira/browse/HIVE-1559 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain > > Copying from https://issues.apache.org/jira/browse/HIVE-1556 > >> BTW, if I run 'ant test' in hive's root directory, it seems the > >> TestContrib* were not tested. Is it expected? > TestContribCliDriver should be run as part of 'ant test' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1559) Contrib tests not run as part of 'ant test'
Contrib tests not run as part of 'ant test' --- Key: HIVE-1559 URL: https://issues.apache.org/jira/browse/HIVE-1559 Project: Hadoop Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Namit Jain Copying from https://issues.apache.org/jira/browse/HIVE-1556 >> BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* >> were not tested. Is it expected? TestContribCliDriver should be run as part of 'ant test' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899953#action_12899953 ] Namit Jain commented on HIVE-1556: -- >> BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* >> were not tested. Is it expected? No, this is not expected. Because of this, we missed it in the first place. I will file a follow-up on this > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1556: - Status: Resolved (was: Patch Available) Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Namit! > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899943#action_12899943 ] Ning Zhang commented on HIVE-1556: -- All the TestContrib* test cases passed in Hive root directory. > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899941#action_12899941 ] Ning Zhang commented on HIVE-1556: -- I cleaned up and ran it again under contrib/ and there are compilation errors (QTestUtil was not found). So I'm not sure if it is a good way to run it inside contrib, so I'm running it in hive's root directory. BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* were not tested. Is it expected? > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Huyn updated HIVE-1549: -- Status: Patch Available (was: In Progress) Release Note: This CORR udaf is implemented using a stable one-pass algorithm, similar to the one used in the COVAR_POP udaf. This release is ready for code review. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).
[ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Huyn updated HIVE-1549: -- Attachment: HIVE-1549.1.patch This CORR UDAF is implemented using a one-pass stable algorithm, very similar to the implementation of the COVAR_POP UPAF. This code release is now ready for review. > Add ANSI SQL correlation aggregate function CORR(X,Y). > -- > > Key: HIVE-1549 > URL: https://issues.apache.org/jira/browse/HIVE-1549 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Pierre Huyn >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1549.1.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > Aggregate function that computes the Pearson's coefficient of correlation > between a set of number pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1558) introducing the "dual" table
introducing the "dual" table Key: HIVE-1558 URL: https://issues.apache.org/jira/browse/HIVE-1558 Project: Hadoop Hive Issue Type: New Feature Reporter: Ning Zhang The "dual" table in MySQL and Oracle is very convenient in testing UDFs or constructing rows without reading any other tables. If dual is the only data source we could leverage the local mode execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899928#action_12899928 ] Namit Jain commented on HIVE-1556: -- Both of them should work. Can you post some of the diffs ? > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams
[ https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1518: Attachment: HIVE-1518.5.patch > context_ngrams() UDAF for estimating top-k contextual n-grams > - > > Key: HIVE-1518 > URL: https://issues.apache.org/jira/browse/HIVE-1518 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, > HIVE-1518.4.patch, HIVE-1518.5.patch > > > Create a new context_ngrams() function that generalizes the ngrams() UDAF to > allow the user to specify context around n-grams. The analogy is > "fill-in-the-blanks", and is best illustrated with an example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM > twitter; > will estimate the top-300 words that follow the phrase "i love" in a database > of tweets. The position of the null(s) specifies where to generate the n-gram > from, and can be placed anywhere. For example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", > "hate", null), 300) FROM twitter; > will estimate the top-300 word-pairs that fill in the blanks specified by > null. > POSSIBLE USES: > 1. Pre-computing search lookaheads > 2. Sentiment analysis for products or entities -- e.g., querying with context > = array("twitter", "is", null) > 3. Navigation path analysis in URL databases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams
[ https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1518: Status: Patch Available (was: Open) It was the new hook format. This should fix it. > context_ngrams() UDAF for estimating top-k contextual n-grams > - > > Key: HIVE-1518 > URL: https://issues.apache.org/jira/browse/HIVE-1518 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, > HIVE-1518.4.patch, HIVE-1518.5.patch > > > Create a new context_ngrams() function that generalizes the ngrams() UDAF to > allow the user to specify context around n-grams. The analogy is > "fill-in-the-blanks", and is best illustrated with an example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM > twitter; > will estimate the top-300 words that follow the phrase "i love" in a database > of tweets. The position of the null(s) specifies where to generate the n-gram > from, and can be placed anywhere. For example: > SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", > "hate", null), 300) FROM twitter; > will estimate the top-300 word-pairs that fill in the blanks specified by > null. > POSSIBLE USES: > 1. Pre-computing search lookaheads > 2. Sentiment analysis for products or entities -- e.g., querying with context > = array("twitter", "is", null) > 3. Navigation path analysis in URL databases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899921#action_12899921 ] Ning Zhang commented on HIVE-1556: -- Namit, I ran ant test -Dtestcase=TestContribCliDriver -Dhadoop.version=0.20.0 in hive's root directory and it succeeded. but if I run ant test in contrib subdirectory, there are diffs (related to the hooks). Do you know which is the correct way? > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899918#action_12899918 ] Namit Jain edited comment on HIVE-1293 at 8/18/10 1:17 PM: --- Agreed on the bug in getLockObjects() - will have a new patch. Filed a new patch for the followup: https://issues.apache.org/jira/browse/HIVE-1557 was (Author: namit): Agreed on the bug in getLockObjects() - will have a new patch. Filed a new patch for the followup: https://issues.apache.org/jira/browse/HIVE-1293 > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899918#action_12899918 ] Namit Jain commented on HIVE-1293: -- Agreed on the bug in getLockObjects() - will have a new patch. Filed a new patch for the followup: https://issues.apache.org/jira/browse/HIVE-1293 > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1557) increase concurrency
increase concurrency Key: HIVE-1557 URL: https://issues.apache.org/jira/browse/HIVE-1557 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Copying Joy's comment from https://issues.apache.org/jira/browse/HIVE-1293 a little bummed that locks need to be held for entire query execution. that could mean a writer blocking readers for hours. hive's query plans seem to be of two distinct stages: 1. read a bunch of stuff, compute intermediate/final data 2. move final data into output locations ie. - a single query never reads what it writes (into a final output location). even if #1 and #2 are mingled today - they can easily be put in order. in that sense - we only need to get shared locks for all read entities involved in #1 to begin with. once phase #1 is done, we can drop all the read locks and get the exclusive locks for all the write entities in #2, perform #2 and quit. that way exclusive locks are held for a very short duration. i think this scheme is similarly deadlock free (now there are two independent lock acquire/release phases - and each of them can lock stuff in lex. order). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899914#action_12899914 ] Ning Zhang commented on HIVE-1556: -- +1 will commit when tests pass. > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1556: - Attachment: hive.1556.1.patch > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1556) tests broken
[ https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1556: - Status: Patch Available (was: Open) > tests broken > > > Key: HIVE-1556 > URL: https://issues.apache.org/jira/browse/HIVE-1556 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.1556.1.patch > > > Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver > is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1556) tests broken
tests broken Key: HIVE-1556 URL: https://issues.apache.org/jira/browse/HIVE-1556 Project: Hadoop Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Namit Jain Assignee: Namit Jain Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver is broken. Some test results need to be updated -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1554) Hive Unable to start due to metastore exception
[ https://issues.apache.org/jira/browse/HIVE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soundararajan Velu updated HIVE-1554: - Description: When I try to restart Hive, sometimes it fails with weird exception around metastore. Following is the error message that it spits out. 2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore (HiveMetaStore.java:(107)) - Unable to initialize the metastore :Exception thrown performing schema operation : Add classes to Catalog "", Schema "APP" We are using Derby10.5.3 in server mode we have connected to derby through this URL jdbc:derby://{IP}:{PORT}/metastore_db;create=true. If I remove the metastore_db from derby/bin, it starts and creates a new metastore_db in derby/bin. I suspect metastore_db gets corrupted for some reason, but I am able to open the same metastore_db through other clients like squirrel, DBexplorer etc... Here my doubt is metastore_db corrupted or some other problem. If it is corrupted how to recover the db. was: Hive is running after restared some time it is not starting because of metastore excepiton . The exception is like this 2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore (HiveMetaStore.java:(107)) - Unable to initialize the metastore :Exception thrown performing schema operation : Add classes to Catalog "", Schema "APP" We are using Derby10.5.3 in server mode we have connected to derby through this URL jdbc:derby://{IP}:{PORT}/metastore_db;create=true. If i remove metastore_db from the derby/bin location it is starting and creating new metastore_db in derby/bin. I suspect metastore_db is corrupted but i am able to open the same metastore_db through the squirrel client. Here my doubt is metastore_db corrupted or some other problem. If it is corrupted how to recover the db. > Hive Unable to start due to metastore exception > --- > > Key: HIVE-1554 > URL: https://issues.apache.org/jira/browse/HIVE-1554 > Project: Hadoop Hive > Issue Type: Bug > Components: CLI, Metastore, Server Infrastructure >Affects Versions: 0.5.0 > Environment: Suse Linux v 11, Hadoop v .20.1. , Derby10.5.3 >Reporter: Soundararajan Velu > > When I try to restart Hive, sometimes it fails with weird exception around > metastore. > Following is the error message that it spits out. > 2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore > (HiveMetaStore.java:(107)) - Unable to initialize the metastore > :Exception thrown performing schema operation : Add classes to Catalog "", > Schema "APP" > We are using Derby10.5.3 in server mode we have connected to derby through > this URL > jdbc:derby://{IP}:{PORT}/metastore_db;create=true. > If I remove the metastore_db from derby/bin, it starts and creates a new > metastore_db in derby/bin. I suspect metastore_db gets corrupted for some > reason, but I am able to open the same metastore_db through other clients > like squirrel, DBexplorer etc... > Here my doubt is metastore_db corrupted or some other problem. If it is > corrupted how to recover the db. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1555) JDBC Storage Handler
JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.5.0 Reporter: Bob Robertson With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Status: In Progress (was: Patch Available) There are additional log changes and a minor code change after hadoop 0.20 tests. I'll upload a new patch once 0.17 finishs. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1554) Hive Unable to start due to metastore exception
Hive Unable to start due to metastore exception --- Key: HIVE-1554 URL: https://issues.apache.org/jira/browse/HIVE-1554 Project: Hadoop Hive Issue Type: Bug Components: CLI, Metastore, Server Infrastructure Affects Versions: 0.5.0 Environment: Suse Linux v 11, Hadoop v .20.1. , Derby10.5.3 Reporter: Soundararajan Velu Hive is running after restared some time it is not starting because of metastore excepiton . The exception is like this 2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore (HiveMetaStore.java:(107)) - Unable to initialize the metastore :Exception thrown performing schema operation : Add classes to Catalog "", Schema "APP" We are using Derby10.5.3 in server mode we have connected to derby through this URL jdbc:derby://{IP}:{PORT}/metastore_db;create=true. If i remove metastore_db from the derby/bin location it is starting and creating new metastore_db in derby/bin. I suspect metastore_db is corrupted but i am able to open the same metastore_db through the squirrel client. Here my doubt is metastore_db corrupted or some other problem. If it is corrupted how to recover the db. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1553) NPE when using complex string UDF
[ https://issues.apache.org/jira/browse/HIVE-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899855#action_12899855 ] Wojciech Langiewicz commented on HIVE-1553: --- It also happens on columns that are not allowed to be NULL (and are not), so https://issues.apache.org/jira/browse/HIVE-1011 probably won't fix this. > NPE when using complex string UDF > - > > Key: HIVE-1553 > URL: https://issues.apache.org/jira/browse/HIVE-1553 > Project: Hadoop Hive > Issue Type: Bug > Components: UDF >Affects Versions: 0.5.0 > Environment: CDH3B2 version on debian >Reporter: Wojciech Langiewicz > > When executing this query: {code}select explode(split(city, "")) as char from > users;{code} I get NPE: {code}java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:170){code} > But in case of this query:{code}select explode(split(city, "")) as char from > users where id = 234234;{code} NPE does not occur, but in case of this query: > {code}select explode(split(city, "")) as char from users where id > 0;{code} > Some mappers succed, but most of them fails, so whole task fails. > city is a string column and maximum users.id is about 30M. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1553) NPE when using complex string UDF
[ https://issues.apache.org/jira/browse/HIVE-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wojciech Langiewicz updated HIVE-1553: -- Description: When executing this query: {code}select explode(split(city, "")) as char from users;{code} I get NPE: {code}java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170){code} But in case of this query:{code}select explode(split(city, "")) as char from users where id = 234234;{code} NPE does not occur, but in case of this query: {code}select explode(split(city, "")) as char from users where id > 0;{code} Some mappers succed, but most of them fails, so whole task fails. city is a string column and maximum users.id is about 30M. was: When executing this query: {code}select explode(split(city, "")) as char from users;{code} I get NPE: {code}java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170){code} But in case of this query:{code}select explode(split(city, "")) as char from users where id = 234234;{code} NPE does not occur. > NPE when using complex string UDF > - > > Key: HIVE-1553 > URL: https://issues.apache.org/jira/browse/HIVE-1553 > Project: Hadoop Hive > Issue Type: Bug > Components: UDF >Affects Versions: 0.5.0 > Environment: CDH3B2 version on debian >Reporter: Wojciech Langiewicz > > When executing this query: {code}select explode(split(city, "")) as char from > users;{code} I get NPE: {code}java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org
[jira] Created: (HIVE-1553) NPE when using complex string UDF
NPE when using complex string UDF - Key: HIVE-1553 URL: https://issues.apache.org/jira/browse/HIVE-1553 Project: Hadoop Hive Issue Type: Bug Components: UDF Affects Versions: 0.5.0 Environment: CDH3B2 version on debian Reporter: Wojciech Langiewicz When executing this query: {code}select explode(split(city, "")) as char from users;{code} I get NPE: {code}java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170){code} But in case of this query:{code}select explode(split(city, "")) as char from users where id = 234234;{code} NPE does not occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1552) Nulls are not handled in Sort Merge MapJoin
[ https://issues.apache.org/jira/browse/HIVE-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899811#action_12899811 ] Amareshwari Sriramadasu commented on HIVE-1552: --- Are NULL values allowed for a sorted column? I think that the answer is Yes, because insert/load does not complain about null values. > Nulls are not handled in Sort Merge MapJoin > --- > > Key: HIVE-1552 > URL: https://issues.apache.org/jira/browse/HIVE-1552 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > > If SMBMAPJoinOperator finds null keys in Join it fails with > NullPointerException : > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.io.IntWritable.compareTo(IntWritable.java:60) > at > org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:115) > at > org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.compareKeys(SMBMapJoinOperator.java:389) > at > org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processKey(SMBMapJoinOperator.java:438) > at > org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:205) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458) > at > org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.fetchOneRow(SMBMapJoinOperator.java:479) > ... 17 more > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1544) Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved
[ https://issues.apache.org/jira/browse/HIVE-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899805#action_12899805 ] Amareshwari Sriramadasu commented on HIVE-1544: --- Also,see Namit's [comment|https://issues.apache.org/jira/browse/HIVE-741?focusedCommentId=12899177&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12899177] on HIVE-741 > Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join > involved > --- > > Key: HIVE-1544 > URL: https://issues.apache.org/jira/browse/HIVE-1544 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang > > As discussed in HIVE-741, if a plan indicates that a non-outer join is the > first operator in the reducer, the ReduceSinkOperator should filter out (not > sending) rows with NULL as keys since they will not generate any results > anyways. This should save both bandwidth and processing power. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1552) Nulls are not handled in Sort Merge MapJoin
Nulls are not handled in Sort Merge MapJoin --- Key: HIVE-1552 URL: https://issues.apache.org/jira/browse/HIVE-1552 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu If SMBMAPJoinOperator finds null keys in Join it fails with NullPointerException : {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.io.IntWritable.compareTo(IntWritable.java:60) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:115) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.compareKeys(SMBMapJoinOperator.java:389) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processKey(SMBMapJoinOperator.java:438) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:205) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:698) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.fetchOneRow(SMBMapJoinOperator.java:479) ... 17 more {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Status: Patch Available (was: Open) Submitting patch for the review of patch-741-1.txt > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Attachment: patch-741-1.txt smbjoin_nulls.q.txt Attaching patch that fixes the bugs in earlier patch, that Ning has found. Also adds more testcases. bq. Can you also add one or few tests for sort merge join? Attached file smbjoin_nulls.q.txt has tests for sort merge join. But it fails with NPE as mentioned as earlier. I tried to fix the NPE, but could not come up with a fix. Shall I do it on followup jira? bq. For inner, left and right outer joins, a simpler fix would be to add a filter on top. I think this can be done as part of HIVE-1544 as an improvement. bq. @Amareshwari, sorry the syntax was wrong for the 3 table joins. Ning, Hive was not complaining about the syntax. So, included this also in the testcase. The results are fine with the latest patch. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.