Re: Hive BeeLine

2015-07-06 Thread Trainee Bingo
Hi Noam,

The file exist on my machine.. I can do cat,ls,ll etc on it.

On Mon, Jul 6, 2015 at 2:31 AM, Noam Hasson noam.has...@kenshoo.com wrote:

 Just making sure, LOAD DATA LOCAL INPATH loads files from your local file
 system, did you make sure the file exist on your machine?

 On Mon, Jul 6, 2015 at 12:15 PM, Trainee Bingo trainee1...@gmail.com
 wrote:

 Hi Users,

 I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD
 DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do
 LOAD DATA INPATH it takes it successfully.

 Can anyone pls tell me why local inpath does not work??



 Thanks,
 Trainee.



 This e-mail, as well as any attached document, may contain material which
 is confidential and privileged and may include trademark, copyright and
 other intellectual property rights that are proprietary to Kenshoo Ltd,
  its subsidiaries or affiliates (Kenshoo). This e-mail and its
 attachments may be read, copied and used only by the addressee for the
 purpose(s) for which it was disclosed herein. If you have received it in
 error, please destroy the message and any attachment, and contact us
 immediately. If you are not the intended recipient, be aware that any
 review, reliance, disclosure, copying, distribution or use of the contents
 of this message without Kenshoo's express permission is strictly prohibited.


Hive BeeLine

2015-07-06 Thread Trainee Bingo
Hi Users,

I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD
DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do
LOAD DATA INPATH it takes it successfully.

Can anyone pls tell me why local inpath does not work??



Thanks,
Trainee.


Re: Hive BeeLine

2015-07-06 Thread Noam Hasson
Just making sure, LOAD DATA LOCAL INPATH loads files from your local file
system, did you make sure the file exist on your machine?

On Mon, Jul 6, 2015 at 12:15 PM, Trainee Bingo trainee1...@gmail.com
wrote:

 Hi Users,

 I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD
 DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do
 LOAD DATA INPATH it takes it successfully.

 Can anyone pls tell me why local inpath does not work??



 Thanks,
 Trainee.


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.


Re: Is hive 0.13 index working fine on partition tables?

2015-07-06 Thread Jim Green
Anyone knows the JIRAs related to this issue?

On Mon, Jun 29, 2015 at 2:35 PM, Jim Green openkbi...@gmail.com wrote:

 Hi Team,

 On hive 0.13, I have a minimum reproduce for index on partition table
 issue:
 CREATE TABLE test_partition_index(
 id1 bigint,
 id2 bigint,
 id3 bigint)
 PARTITIONED BY (
 dt string)
 row format delimited fields terminated by ',';

 cat sampledata
 111,222,333

 LOAD DATA LOCAL INPATH 'sampledata' OVERWRITE INTO TABLE
 test_partition_index PARTITION (dt='20150101');
 LOAD DATA LOCAL INPATH 'sampledata' OVERWRITE INTO TABLE
 test_partition_index PARTITION (dt='20150102');

 CREATE INDEX test_partition_index_idx ON TABLE test_partition_index (id1)
 AS 'COMPACT' WITH DEFERRED REBUILD;
 ALTER INDEX test_partition_index_idx ON test_partition_index REBUILD;
 set hive.optimize.index.filter=true;
 set hive.optimize.index.filter.compact.minsize=1;
 select * from test_partition_index where dt in (20150101) and id1=111 ;

 The error is:
 Number of reduce tasks is set to 0 since there's no reduce operator
 java.io.IOException: cannot find dir =
 xxx:/user/hive/warehouse/test_partition_index/dt=20150102/sampledata in
 pathToPartitionInfo:
 [xxx:/user/hive/warehouse/test_partition_index/dt=20150101]
 at
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:344)
 at
 org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.doGetSplits(HiveIndexedInputFormat.java:81)
 at
 org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:149)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:135)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 Is this issue fixed in latest version of Hive?
 If so, which JIRA is related?
 Thanks.

 --
 Thanks,
 www.openkb.info
 (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)




-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Limiting outer join

2015-07-06 Thread Bennie Leo
 Hi,
 
In the following query, it is possible to limit the amount of entries returned 
by an outer join to a single value? I want to obtain a single country from 
ipv4geotable for each entry in logontable. 

CREATE TABLE ipv4table AS
SELECT logon.IP, ipv4.Country
FROM 
(SELECT * FROM logontable WHERE isIpv4(IP)) logon
LEFT OUTER JOIN
(SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) 
WHERE ipv4.StartIp = logon.IP AND logon.IP = ipv4.EndIp;
 
For instance, if I had the IP W.X.Y.Z in logontable, and that W.X.Y.Z fell 
in the range of both Italy and Spain in ipv4geotable, then I would like to 
associate it with Italy only. 
I've tried adding LIMIT 1 to the second subquery :(SELECT StartIp, EndIp, 
Country FROM ipv4geotable LIMIT 1) ipv4 ON isIpv4(logon.IP)but this is wrong 
since the WHERE clause has to traverse all IPs. Limiting the where clause 
doesn't help either. 
Any ideas?
 
Thank you!
B
 

  

Re: Limiting outer join

2015-07-06 Thread Gopal Vijayaraghavan

 In the following query, it is possible to limit the amount of entries
returned by an outer join to a single value? I want to obtain a single
country from ipv4geotable for each entry in logontable.

Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can
read the first row out of each logon.IP except, there¹s no way to force
which country wins over the other without an order by country in the
OVER() clause as well.

That said, it will only get slower to produce 1 row per group, because of
the distributed nature of the SQL engine, the reduction of data happens
after a ordering shuffle.

You¹re doing range joins in a SQL engine without theta joins and MapReduce
had no way to implement those at runtime (Tez has, with EdgeManager
plugins).

The easiest/traditional approach out of doing geo-IP lookups is a compact
UDF model without any joins at all.

There¹s some old threads on discussing this as a built-in  some code
(with potential licensing issues) -
http://markmail.org/message/w54j4upwg2wbh3xg

Cheers,
Gopal




Re: Unsuscribe

2015-07-06 Thread Lefty Leverenz
Payal Radheshamji Agrawal, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
http://hive.apache.org/mailing_lists.html.


Thanks.

-- Lefty

On Mon, Jul 6, 2015 at 8:46 AM, Payal Radheshamji Agrawal 
payal.agra...@datametica.com wrote:





Re: Hive With tez

2015-07-06 Thread Jeff Zhang
Regarding the mapper task number, Hive on tez is very similar with Hive on
MapReduce. One difference is that hive on tez can group split together
which may use less tasks than mapreduce.  What issues did you see when you
use hive on tez ?

On Sun, Jul 5, 2015 at 10:39 PM, saurabh mpp.databa...@gmail.com wrote:

 Hi,

 We are in process of exploring TEZ for Hive 0.14.
 Needed some pointers to start on Hive with Tez.
 E.g. in Hive HDFS Block size plays a vital role in getting the number of
 Mappers and later independent execution of mappers can accelerate
 processing substantially.

 I understand this is a very vast topic and cannot be described, however
 some quick pointers will be helpful.

 I am currently working on:
 Query vectorization and COB with ORC tables.

 Thanks,
 Saurabh




-- 
Best Regards

Jeff Zhang