[jira] [Updated] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan
[ https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3420: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Inefficiency in hbase handler when process query including rowkey range scan Key: HIVE-3420 URL: https://issues.apache.org/jira/browse/HIVE-3420 Project: Hive Issue Type: Improvement Components: HBase Handler Environment: Hive-0.9.0 + HBase-0.94.1 Reporter: Gang Deng Assignee: Navis Priority: Critical Fix For: 0.13.0 Attachments: HIVE-3420.D7311.1.patch Original Estimate: 2h Remaining Estimate: 2h When query hive with hbase rowkey range, hive map tasks do not leverage startrow, endrow information in tablesplit. For example, if the rowkeys fit into 5 hbase files, then where will be 5 map tasks. Ideally, each task will process 1 file. But in current implementation, each task processes 5 files repeatedly. The behavior not only waste network bandwidth, but also worse the lock contention in HBase block cache as each task have to access the same block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below: …… if (tableSplit != null) { tableSplit = new TableSplit( tableSplit.getTableName(), startRow, stopRow, tableSplit.getRegionLocation()); } scan.setStartRow(startRow); scan.setStopRow(stopRow); …… As tableSplit already include startRow, endRow information of file, the better implementation will be: …… byte[] splitStart = startRow; byte[] splitStop = stopRow; if (tableSplit != null) { if(tableSplit.getStartRow() != null){ splitStart = startRow.length == 0 || Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ? tableSplit.getStartRow() : startRow; } if(tableSplit.getEndRow() != null){ splitStop = (stopRow.length == 0 || Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) tableSplit.getEndRow().length 0 ? tableSplit.getEndRow() : stopRow; } tableSplit = new TableSplit( tableSplit.getTableName(), splitStart, splitStop, tableSplit.getRegionLocation()); } scan.setStartRow(splitStart); scan.setStopRow(splitStop); …… In my test, the changed code will improve performance more than 30%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4247) Filtering on a hbase row key duplicates results across multiple mappers
[ https://issues.apache.org/jira/browse/HIVE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4247. Resolution: Duplicate Fix Version/s: 0.13.0 Fixed via HIVE-3420 Filtering on a hbase row key duplicates results across multiple mappers --- Key: HIVE-4247 URL: https://issues.apache.org/jira/browse/HIVE-4247 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Environment: All Platforms Reporter: Karthik Kumara Labels: patch Fix For: 0.13.0 Attachments: HiveHBaseTableInputFormat.patch Steps to reproduce 1. Create a Hive external table with HiveHbaseHandler with enough data in the hbase table to spawn multiple mappers for the hive query. 2. Write a query which has a filter (in the where clause) based on the hbase row key. 3. Running the map reduce job leads to each mapper querying the entire data set. duplicating the data for each mapper. Each mapper processes the entire filtered range and the results get multiplied as the number of mappers run. Expected behavior: Each mapper should process a different part of the data and should not duplicate. Cause: The cause seems to be the convertFilter method in HiveHBaseTableInputFormat. convertFilter has this piece of code which rewrites the start and the stop row for each split which leads each mapper to process the entire range if (tableSplit != null) { tableSplit = new TableSplit( tableSplit.getTableName(), startRow, stopRow, tableSplit.getRegionLocation()); } The scan already has the start and stop row set when the splits are created. So this piece of code is probably redundant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan
[ https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774003#comment-13774003 ] Hudson commented on HIVE-3420: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #179 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/179/]) HIVE-3420 : Inefficiency in hbase handler when process query including rowkey range scan (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525329) * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java Inefficiency in hbase handler when process query including rowkey range scan Key: HIVE-3420 URL: https://issues.apache.org/jira/browse/HIVE-3420 Project: Hive Issue Type: Improvement Components: HBase Handler Environment: Hive-0.9.0 + HBase-0.94.1 Reporter: Gang Deng Assignee: Navis Priority: Critical Fix For: 0.13.0 Attachments: HIVE-3420.D7311.1.patch Original Estimate: 2h Remaining Estimate: 2h When query hive with hbase rowkey range, hive map tasks do not leverage startrow, endrow information in tablesplit. For example, if the rowkeys fit into 5 hbase files, then where will be 5 map tasks. Ideally, each task will process 1 file. But in current implementation, each task processes 5 files repeatedly. The behavior not only waste network bandwidth, but also worse the lock contention in HBase block cache as each task have to access the same block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below: …… if (tableSplit != null) { tableSplit = new TableSplit( tableSplit.getTableName(), startRow, stopRow, tableSplit.getRegionLocation()); } scan.setStartRow(startRow); scan.setStopRow(stopRow); …… As tableSplit already include startRow, endRow information of file, the better implementation will be: …… byte[] splitStart = startRow; byte[] splitStop = stopRow; if (tableSplit != null) { if(tableSplit.getStartRow() != null){ splitStart = startRow.length == 0 || Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ? tableSplit.getStartRow() : startRow; } if(tableSplit.getEndRow() != null){ splitStop = (stopRow.length == 0 || Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) tableSplit.getEndRow().length 0 ? tableSplit.getEndRow() : stopRow; } tableSplit = new TableSplit( tableSplit.getTableName(), splitStart, splitStop, tableSplit.getRegionLocation()); } scan.setStartRow(splitStart); scan.setStopRow(splitStop); …… In my test, the changed code will improve performance more than 30%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5154: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Remove unnecessary array creation in ReduceSinkOperator --- Key: HIVE-5154 URL: https://issues.apache.org/jira/browse/HIVE-5154 Project: Hive Issue Type: Task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.13.0 Attachments: HIVE-5154.D12549.1.patch Key array is created for each row, which seemed not necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code
[ https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5253: -- Attachment: HIVE-5253.3.patch.txt Create component to compile and jar dynamic code Key: HIVE-5253 URL: https://issues.apache.org/jira/browse/HIVE-5253 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774088#comment-13774088 ] Hudson commented on HIVE-4113: -- FAILURE: Integrated in Hive-trunk-hadoop2 #450 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/450/]) HIVE-4113 : Optimize select count(1) with RCFile and Orc (Brock Noland and Yin Huai via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525322) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes.q.out * /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes2.q.out * /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes3.q.out * /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes5.q.out * /hive/trunk/contrib/src/test/results/clientpositive/udf_row_sequence.q.out * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java * /hive/trunk/hbase-handler/src/test/results/positive/hbase_queries.q.out * /hive/trunk/hbase-handler/src/test/results/positive/hbase_single_sourced_multi_insert.q.out * /hive/trunk/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java * /hive/trunk/hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java * /hive/trunk/ql/src/test/queries/clientpositive/binary_table_colserde.q * /hive/trunk/ql/src/test/results/clientpositive/auto_join0.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join15.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join18.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join20.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join30.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join31.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join_reordering_values.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_smb_mapjoin_14.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out * /hive/trunk/ql/src/test/results/clientpositive/binary_output_format.q.out * /hive/trunk/ql/src/test/results/clientpositive/binary_table_colserde.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucket5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out *
[jira] [Commented] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774095#comment-13774095 ] Hudson commented on HIVE-5154: -- FAILURE: Integrated in Hive-trunk-hadoop2 #451 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/451/]) HIVE-5154 : Remove unnecessary array creation in ReduceSinkOperator (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525381) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java Remove unnecessary array creation in ReduceSinkOperator --- Key: HIVE-5154 URL: https://issues.apache.org/jira/browse/HIVE-5154 Project: Hive Issue Type: Task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.13.0 Attachments: HIVE-5154.D12549.1.patch Key array is created for each row, which seemed not necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan
[ https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774096#comment-13774096 ] Hudson commented on HIVE-3420: -- FAILURE: Integrated in Hive-trunk-hadoop2 #451 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/451/]) HIVE-3420 : Inefficiency in hbase handler when process query including rowkey range scan (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525329) * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java Inefficiency in hbase handler when process query including rowkey range scan Key: HIVE-3420 URL: https://issues.apache.org/jira/browse/HIVE-3420 Project: Hive Issue Type: Improvement Components: HBase Handler Environment: Hive-0.9.0 + HBase-0.94.1 Reporter: Gang Deng Assignee: Navis Priority: Critical Fix For: 0.13.0 Attachments: HIVE-3420.D7311.1.patch Original Estimate: 2h Remaining Estimate: 2h When query hive with hbase rowkey range, hive map tasks do not leverage startrow, endrow information in tablesplit. For example, if the rowkeys fit into 5 hbase files, then where will be 5 map tasks. Ideally, each task will process 1 file. But in current implementation, each task processes 5 files repeatedly. The behavior not only waste network bandwidth, but also worse the lock contention in HBase block cache as each task have to access the same block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below: …… if (tableSplit != null) { tableSplit = new TableSplit( tableSplit.getTableName(), startRow, stopRow, tableSplit.getRegionLocation()); } scan.setStartRow(startRow); scan.setStopRow(stopRow); …… As tableSplit already include startRow, endRow information of file, the better implementation will be: …… byte[] splitStart = startRow; byte[] splitStop = stopRow; if (tableSplit != null) { if(tableSplit.getStartRow() != null){ splitStart = startRow.length == 0 || Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ? tableSplit.getStartRow() : startRow; } if(tableSplit.getEndRow() != null){ splitStop = (stopRow.length == 0 || Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) tableSplit.getEndRow().length 0 ? tableSplit.getEndRow() : stopRow; } tableSplit = new TableSplit( tableSplit.getTableName(), splitStart, splitStop, tableSplit.getRegionLocation()); } scan.setStartRow(splitStart); scan.setStopRow(splitStop); …… In my test, the changed code will improve performance more than 30%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-664) optimize UDF split
[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774126#comment-13774126 ] Ashutosh Chauhan commented on HIVE-664: --- +1 optimize UDF split -- Key: HIVE-664 URL: https://issues.apache.org/jira/browse/HIVE-664 Project: Hive Issue Type: Bug Components: UDF Reporter: Namit Jain Assignee: Teddy Choi Labels: optimization Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Show » ] Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Permlink | « Hide ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Show » ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Permlink | « Hide ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. [ Show » ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2
[ https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774211#comment-13774211 ] Vaibhav Gumashta commented on HIVE-4763: [~cwsteinbach] [~thejas] Uploaded the changes on phabricator: https://reviews.facebook.net/D12951 add support for thrift over http transport in HS2 - Key: HIVE-4763 URL: https://issues.apache.org/jira/browse/HIVE-4763 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, HIVE-4763.D12855.1.patch Subtask for adding support for http transport mode for thrift api in hive server2. Support for the different authentication modes will be part of another subtask. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up
[ https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774212#comment-13774212 ] Sushanth Sowmyan commented on HIVE-5274: Okay, having generated a simple patch that changes the imports, tests are failing because of issues casting HBaseHCatStorageHandler to a HCatStorageHandler(which the old HCIF/HCOF do), which it no longer is - it is a HiveStorageHandler. So, what we can do is as follows: a) old code can be rewritten to use HiveStorageHandler as a post HIVE-5261 mode - i.e. the only difference between org.apache.hcatalog.* and org.apache.hive.hcatalog.* will be that the package names are different. But if we did that, then that might break external storage handlers that people have written against HCatStorageHandler that they expect to continue working in a org.apache.hcatalog.* world. b) viraj's changes to the hbase hcat storagehandler can be changed so that it continues to follow the old model, using HCatStorageHandler, and not using HiveStorageHandler. The latter is less disruptive a change, but this effectively relegates its use to the org.apache.hcatalog.* world. I still see that as ideal here. However, if this approach is taken, then the HCatHBaseStorageHandler is not usable any longer by the org.apache.hive.hcatalog.* world. That said, I do not think that is a problem, since we're trying to deprecate this too, and we'd be encouraging use of the Hive HBaseStorageHandler for any new code that might use the new packages. [~alangates], [~ekoifman] : Any thoughts/opinions? HCatalog package renaming backward compatibility follow-up -- Key: HIVE-5274 URL: https://issues.apache.org/jira/browse/HIVE-5274 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 0.12.0 As part of HIVE-4869, the hbase storage handler in hcat was moved to org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it was intended to be deprecated as well. However, it imports and uses several org.apache.hive.hcatalog classes. This needs to be changed to use org.apache.hcatalog classes. == Note : The above is a complete description of this issue in and of by itself, the following is more details on the backward-compatibility goal I have(not saying that each of these things are violated) : a) People using org.apache.hcatalog packages should continue being able to use that package, and see no difference at compile time or runtime. All code here is considered deprecated, and will be gone by the time hive 0.14 rolls around. Additionally, org.apache.hcatalog should behave as if it were 0.11 for all compatibility purposes. b) People using org.apache.hive.hcatalog packages should never have an org.apache.hcatalog dependency injected in. Thus, It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages internally (say HCatUtil, for example), as long as any interfaces only expose org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be capable of testing it from a pure org.apache.hcatalog.\* world. It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, even in tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code
[ https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5253: -- Attachment: HIVE-5253.3.patch.txt Create component to compile and jar dynamic code Key: HIVE-5253 URL: https://issues.apache.org/jira/browse/HIVE-5253 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-664) optimize UDF split
[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-664: -- Status: Open (was: Patch Available) Test {{global_limit.q}} failed. optimize UDF split -- Key: HIVE-664 URL: https://issues.apache.org/jira/browse/HIVE-664 Project: Hive Issue Type: Bug Components: UDF Reporter: Namit Jain Assignee: Teddy Choi Labels: optimization Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Show » ] Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Permlink | « Hide ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Show » ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Permlink | « Hide ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. [ Show » ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5279: Status: Patch Available (was: Open) Missed those tests and sorry for delay. Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, D12963.3.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5279: -- Attachment: D12963.4.patch navis updated the revision HIVE-5279 [jira] Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc. Fixed test fails Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D12963 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12963?vs=40065id=40299#toc BRANCH HIVE-5279 ARCANIST PROJECT hive AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSumList.java ql/src/test/queries/clientpositive/udaf_sum_list.q ql/src/test/results/clientpositive/udaf_sum_list.q.out ql/src/test/results/compiler/plan/groupby1.q.xml ql/src/test/results/compiler/plan/groupby2.q.xml ql/src/test/results/compiler/plan/groupby3.q.xml ql/src/test/results/compiler/plan/groupby5.q.xml To: JIRA, ashutoshc, navis Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, D12963.3.patch, D12963.4.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
[ https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774244#comment-13774244 ] Navis commented on HIVE-5320: - It seemed a bug on json-serde(which was on hive before, too) not on hive. Why should we do walk-around for this? Querying a table with nested struct type over JSON data results in errors - Key: HIVE-5320 URL: https://issues.apache.org/jira/browse/HIVE-5320 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-5320.patch Querying a table with nested_struct datatype like == create table nest_struct_tbl (col1 string, col2 arraystructa1:string, a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; == over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted data. The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar. The cause is that the method: public ListObject getStructFieldsDataAsList(Object o) in JsonStructObjectInspector.java returns a list referencing to a static arraylist values So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned with same reference in its recursive calls and its element values are kept on being overwritten in the case STRUCT. Solutions: 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java to instance scope. Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31) 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize can work properly regardless of the extended SerDe implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1577) Add configuration property hive.exec.local.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774255#comment-13774255 ] Lefty Leverenz commented on HIVE-1577: -- Added hive.exec.local.scratchdir to Hive Admin Configuration wikidoc -- [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration]. Also updated the default value of hive.exec.scratchdir. Add configuration property hive.exec.local.scratchdir - Key: HIVE-1577 URL: https://issues.apache.org/jira/browse/HIVE-1577 Project: Hive Issue Type: New Feature Components: Configuration Reporter: Carl Steinbach When Hive is run in local mode it uses the hardcoded local directory {{/${java.io.tmpdir}/${user.name}}} for temporary files. This path should be configurable via the property {{hive.exec.local.scratchdir}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server
[ https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774261#comment-13774261 ] Ashutosh Chauhan commented on HIVE-5172: Though fix seems harmless, I think it doesn't eliminate root cause. Under intense GC pressure, same problem may occur even with this fix. I am fine putting this in, but just be aware this fix has not solved the root cause. TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server - Key: HIVE-5172 URL: https://issues.apache.org/jira/browse/HIVE-5172 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: agate Attachments: HIVE-5172.1.patch.txt We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore Server 0.9) where we get connection reset or connection timeout errors on the client and NullPointerException in TUGIBasedProcessor on the server. {code} hive client logs: = org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more {code} {code} hive metastore server logs: === 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
[jira] [Resolved] (HIVE-5340) TestJdbcDriver2 is failing on trunk.
[ https://issues.apache.org/jira/browse/HIVE-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-5340. Resolution: Invalid {{ant very-clean}} fixed the problem in my setup. TestJdbcDriver2 is failing on trunk. Key: HIVE-5340 URL: https://issues.apache.org/jira/browse/HIVE-5340 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Seems to be related to yesterday's HIVE-5209 commit -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.
[ https://issues.apache.org/jira/browse/HIVE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5338: --- Component/s: Tests TestJdbcDriver2 is failing on trunk. Key: HIVE-5338 URL: https://issues.apache.org/jira/browse/HIVE-5338 Project: Hive Issue Type: Bug Components: Tests Reporter: Ashutosh Chauhan Seems to be related to yesterday's HIVE-5209 commit -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.
[ https://issues.apache.org/jira/browse/HIVE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5338: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-5340 TestJdbcDriver2 is failing on trunk. Key: HIVE-5338 URL: https://issues.apache.org/jira/browse/HIVE-5338 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Ashutosh Chauhan Seems to be related to yesterday's HIVE-5209 commit -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5221) Issue in colun type with data type as BINARY
[ https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774270#comment-13774270 ] Ashutosh Chauhan commented on HIVE-5221: yup.. No default decoding. Just remove conditional decoding. This will break backward compat, but user can just use udf to get the desired behavior. We currently we have a bug, I don't think there is any advantage of being backward-bug-compatible. That is just backwards : ) Issue in colun type with data type as BINARY Key: HIVE-5221 URL: https://issues.apache.org/jira/browse/HIVE-5221 Project: Hive Issue Type: Bug Reporter: Arun Vasu Assignee: Mohammad Kamrul Islam Priority: Critical Attachments: HIVE-5221.1.patch Hi, I am using Hive 10. When I create an external table with column type as Binary, the query result on the table is showing some junk values for the column with binary datatype. Please find below the query I have used to create the table: CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/hivetables/testbinary'; The query I have used is : select * from bool1 The sample data in the hdfs file is: 0^a...@abc.com^001 1^a...@abc.com^010 ^a...@abc.com^011 ^a...@abc.com^100 t^a...@abc.com^101 f^a...@abc.com^110 true^a...@abc.com^111 false^a...@abc.com^001 123^^01100010 12344^^0111 Please share your inputs if it is possible. Thanks, Arun -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description
Rakesh Chouhan created HIVE-5341: Summary: Link doesn't work. Needs to be updated as mentioned in the Description Key: HIVE-5341 URL: https://issues.apache.org/jira/browse/HIVE-5341 Project: Hive Issue Type: Bug Components: Documentation Reporter: Rakesh Chouhan Priority: Blocker Go to.. Apache HIVE Getting Started Documentation https://cwiki.apache.org/confluence/display/Hive/GettingStarted Under Section ... Simple Example Use Cases MovieLens User Ratings wget http://www.grouplens.org/system/files/ml-data.tar+0.gz The link mentioned as per the document does not work. It needs to be updated to the below URL. http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz I am setting this defect's priority as a Blocker because, user's will not be able to continue their hands on exercises, unless they find the correct URL to download the mentioned file. Referenced from: http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira