Re: Hive BeeLine
Hi Noam, The file exist on my machine.. I can do cat,ls,ll etc on it. On Mon, Jul 6, 2015 at 2:31 AM, Noam Hasson noam.has...@kenshoo.com wrote: Just making sure, LOAD DATA LOCAL INPATH loads files from your local file system, did you make sure the file exist on your machine? On Mon, Jul 6, 2015 at 12:15 PM, Trainee Bingo trainee1...@gmail.com wrote: Hi Users, I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do LOAD DATA INPATH it takes it successfully. Can anyone pls tell me why local inpath does not work?? Thanks, Trainee. This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd, its subsidiaries or affiliates (Kenshoo). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited.
Hive BeeLine
Hi Users, I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do LOAD DATA INPATH it takes it successfully. Can anyone pls tell me why local inpath does not work?? Thanks, Trainee.
Re: Hive BeeLine
Just making sure, LOAD DATA LOCAL INPATH loads files from your local file system, did you make sure the file exist on your machine? On Mon, Jul 6, 2015 at 12:15 PM, Trainee Bingo trainee1...@gmail.com wrote: Hi Users, I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do LOAD DATA INPATH it takes it successfully. Can anyone pls tell me why local inpath does not work?? Thanks, Trainee. -- This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd, its subsidiaries or affiliates (Kenshoo). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited.
Re: Is hive 0.13 index working fine on partition tables?
Anyone knows the JIRAs related to this issue? On Mon, Jun 29, 2015 at 2:35 PM, Jim Green openkbi...@gmail.com wrote: Hi Team, On hive 0.13, I have a minimum reproduce for index on partition table issue: CREATE TABLE test_partition_index( id1 bigint, id2 bigint, id3 bigint) PARTITIONED BY ( dt string) row format delimited fields terminated by ','; cat sampledata 111,222,333 LOAD DATA LOCAL INPATH 'sampledata' OVERWRITE INTO TABLE test_partition_index PARTITION (dt='20150101'); LOAD DATA LOCAL INPATH 'sampledata' OVERWRITE INTO TABLE test_partition_index PARTITION (dt='20150102'); CREATE INDEX test_partition_index_idx ON TABLE test_partition_index (id1) AS 'COMPACT' WITH DEFERRED REBUILD; ALTER INDEX test_partition_index_idx ON test_partition_index REBUILD; set hive.optimize.index.filter=true; set hive.optimize.index.filter.compact.minsize=1; select * from test_partition_index where dt in (20150101) and id1=111 ; The error is: Number of reduce tasks is set to 0 since there's no reduce operator java.io.IOException: cannot find dir = xxx:/user/hive/warehouse/test_partition_index/dt=20150102/sampledata in pathToPartitionInfo: [xxx:/user/hive/warehouse/test_partition_index/dt=20150101] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:344) at org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.doGetSplits(HiveIndexedInputFormat.java:81) at org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:149) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:135) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Is this issue fixed in latest version of Hive? If so, which JIRA is related? Thanks. -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool) -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
Limiting outer join
Hi, In the following query, it is possible to limit the amount of entries returned by an outer join to a single value? I want to obtain a single country from ipv4geotable for each entry in logontable. CREATE TABLE ipv4table AS SELECT logon.IP, ipv4.Country FROM (SELECT * FROM logontable WHERE isIpv4(IP)) logon LEFT OUTER JOIN (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) WHERE ipv4.StartIp = logon.IP AND logon.IP = ipv4.EndIp; For instance, if I had the IP W.X.Y.Z in logontable, and that W.X.Y.Z fell in the range of both Italy and Spain in ipv4geotable, then I would like to associate it with Italy only. I've tried adding LIMIT 1 to the second subquery :(SELECT StartIp, EndIp, Country FROM ipv4geotable LIMIT 1) ipv4 ON isIpv4(logon.IP)but this is wrong since the WHERE clause has to traverse all IPs. Limiting the where clause doesn't help either. Any ideas? Thank you! B
Re: Limiting outer join
In the following query, it is possible to limit the amount of entries returned by an outer join to a single value? I want to obtain a single country from ipv4geotable for each entry in logontable. Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can read the first row out of each logon.IP except, there¹s no way to force which country wins over the other without an order by country in the OVER() clause as well. That said, it will only get slower to produce 1 row per group, because of the distributed nature of the SQL engine, the reduction of data happens after a ordering shuffle. You¹re doing range joins in a SQL engine without theta joins and MapReduce had no way to implement those at runtime (Tez has, with EdgeManager plugins). The easiest/traditional approach out of doing geo-IP lookups is a compact UDF model without any joins at all. There¹s some old threads on discussing this as a built-in some code (with potential licensing issues) - http://markmail.org/message/w54j4upwg2wbh3xg Cheers, Gopal
Re: Unsuscribe
Payal Radheshamji Agrawal, to unsubscribe please send a message to user-unsubscr...@hive.apache.org as described here: Mailing Lists http://hive.apache.org/mailing_lists.html. Thanks. -- Lefty On Mon, Jul 6, 2015 at 8:46 AM, Payal Radheshamji Agrawal payal.agra...@datametica.com wrote:
Re: Hive With tez
Regarding the mapper task number, Hive on tez is very similar with Hive on MapReduce. One difference is that hive on tez can group split together which may use less tasks than mapreduce. What issues did you see when you use hive on tez ? On Sun, Jul 5, 2015 at 10:39 PM, saurabh mpp.databa...@gmail.com wrote: Hi, We are in process of exploring TEZ for Hive 0.14. Needed some pointers to start on Hive with Tez. E.g. in Hive HDFS Block size plays a vital role in getting the number of Mappers and later independent execution of mappers can accelerate processing substantially. I understand this is a very vast topic and cannot be described, however some quick pointers will be helpful. I am currently working on: Query vectorization and COB with ORC tables. Thanks, Saurabh -- Best Regards Jeff Zhang