Re: join 2 tables located on different clusters

2015-06-25 Thread Alexander Pivovarov
I tried to reproduce Wrong FS issue in several hive branches

branch-0.14 - works
branch-1.0 - works
branch-1.1 - throws exception

Looks like the error was introduced in 1.1.0 by the following
https://issues.apache.org/jira/browse/HIVE-9264

I opened new JIRA for the issue
https://issues.apache.org/jira/browse/HIVE-6


On Wed, Jun 24, 2015 at 4:08 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 I tried on local hadoop/hive instance  (hive is the latest from master
 branch)

 mydev is ha alias to remote ha name node.

 $ hadoop fs -ls hdfs://mydev/tmp/et1
 Found 1 items
 -rw-r--r--   3 myapp hadoop 16 2015-06-24 16:05
 hdfs://mydev/tmp/et1/et1file

 $ hive

 hive CREATE TABLE et1 (
   a string
 ) stored as textfile
 LOCATION 'hdfs://mydev/tmp/et1';

 hive select * from et1;

 15/06/24 16:01:08 [main]: ERROR parse.CalcitePlanner:
 org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if
 hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
 Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
 at
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.lang.IllegalArgumentException: Wrong FS:
 hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
 at
 org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
 at
 org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
 ... 26 more

 FAILED: SemanticException Unable to determine if hdfs://mydev/tmp/et1 is
 encrypted: java.lang.IllegalArgumentException: Wrong FS:
 hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
 15/06/24 16:01:08 [main]: ERROR ql.Driver: FAILED: SemanticException
 Unable to determine if hdfs://mydev/tmp/et1 is encrypted:
 java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1,
 expected: hdfs://localhost:8020
 org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if
 hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
 Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1850)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
 at
 

Re: ApacheCON EU HBase Track Submissions

2015-06-25 Thread Nick Dimiduk
On Thu, Jun 25, 2015 at 12:13 PM, Owen O'Malley omal...@apache.org wrote:

 Actually, Apache: Big Data Europe CFP closes 10 July.

 The CFP is:
 http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp


My message was regarding a track for ApacheCON EU Core, for which the CFP
closes July 1. Communities are not given the opportunity to organize tracks
for the big data conference.

On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote:

  Hello developers, users, speakers,
 
  As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
  HBase: NoSQL + SQL track come together. The idea is to showcase the
  growing ecosystem of applications and tools built on top of and around
  Apache HBase. To have a track, we need content, and that's where YOU come
  in.
 
  CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
  submitted so we can pull together a full day of great HBase ecosystem
  talks! Already planning to submit a talk on Hive? Work in HBase and we'll
  get it promoted as part of the track!
 
  Thanks,
  Nick
 
  ApacheCON EU
  Sept 28 - Oct 2
  Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome
 city!)
  http://apachecon.eu/
  CFP link: http://events.linuxfoundation.org/cfp/dashboard
 



Cost based optimization

2015-06-25 Thread Raajay
Hello Everyone,

A quick question on the cost-based optimization module in Hive. Does the
latest version support query plan generation with alternate join orders ?

Thanks
Raajay


EXPORTing multiple partitions

2015-06-25 Thread Brian Jeltema
Using Hive .13, I would like to export multiple partitions of a table, 
something conceptually like:

   EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’

Is there any way to accomplish this?

Brian

Re: EXPORTing multiple partitions

2015-06-25 Thread Brian Jeltema
Answering my own question:

  create table foo_copy like foo;
  insert into foo_copy partition (id) select * from foo where id in (1,2,3);
  export table foo_copy to ‘path’;
  drop table foo_copy;

It would be nice if export could do this automatically, though.

Brian

On Jun 25, 2015, at 11:34 AM, Brian Jeltema brian.jelt...@digitalenvoy.net 
wrote:

 Using Hive .13, I would like to export multiple partitions of a table, 
 something conceptually like:
 
   EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’
 
 Is there any way to accomplish this?
 
 Brian



ApacheCON EU HBase Track Submissions

2015-06-25 Thread Nick Dimiduk
Hello developers, users, speakers,

As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
HBase: NoSQL + SQL track come together. The idea is to showcase the
growing ecosystem of applications and tools built on top of and around
Apache HBase. To have a track, we need content, and that's where YOU come
in.

CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
submitted so we can pull together a full day of great HBase ecosystem
talks! Already planning to submit a talk on Hive? Work in HBase and we'll
get it promoted as part of the track!

Thanks,
Nick

ApacheCON EU
Sept 28 - Oct 2
Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!)
http://apachecon.eu/
CFP link: http://events.linuxfoundation.org/cfp/dashboard


Re: ApacheCON EU HBase Track Submissions

2015-06-25 Thread Owen O'Malley
Actually, Apache: Big Data Europe CFP closes 10 July.

The CFP is:
http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp

On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote:

 Hello developers, users, speakers,

 As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
 HBase: NoSQL + SQL track come together. The idea is to showcase the
 growing ecosystem of applications and tools built on top of and around
 Apache HBase. To have a track, we need content, and that's where YOU come
 in.

 CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
 submitted so we can pull together a full day of great HBase ecosystem
 talks! Already planning to submit a talk on Hive? Work in HBase and we'll
 get it promoted as part of the track!

 Thanks,
 Nick

 ApacheCON EU
 Sept 28 - Oct 2
 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!)
 http://apachecon.eu/
 CFP link: http://events.linuxfoundation.org/cfp/dashboard



Re: Cost based optimization

2015-06-25 Thread John Pullokkaran
Hive does look in to alternate join orders and pick the best plan that 
minimizes cost.
It uses a greedy algorithm to enumerate plan space.

Thanks
John

From: Raajay raaja...@gmail.commailto:raaja...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, June 25, 2015 at 2:30 PM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Cost based optimization

Hello Everyone,

A quick question on the cost-based optimization module in Hive. Does the latest 
version support query plan generation with alternate join orders ?

Thanks
Raajay


Re: EXPORTing multiple partitions

2015-06-25 Thread Xuefu Zhang
Hi Brian,

If you think that is useful, please feel free to create a JIRA requesting
for it.

Thanks,
Xuefu

On Thu, Jun 25, 2015 at 10:36 AM, Brian Jeltema 
brian.jelt...@digitalenvoy.net wrote:

 Answering my own question:

   create table foo_copy like foo;
   insert into foo_copy partition (id) select * from foo where id in
 (1,2,3);
   export table foo_copy to ‘path’;
   drop table foo_copy;

 It would be nice if export could do this automatically, though.

 Brian

 On Jun 25, 2015, at 11:34 AM, Brian Jeltema 
 brian.jelt...@digitalenvoy.net wrote:

  Using Hive .13, I would like to export multiple partitions of a table,
 something conceptually like:
 
EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’
 
  Is there any way to accomplish this?
 
  Brian




Hive indexing optimization

2015-06-25 Thread Bennie Leo




Hi,
 
I am attempting to optimize a query using indexing. My current query converts 
an ipv4 address to a country using a geolocation table. However, the 
geolocation table is fairly large and the query takes an impractical amount of 
time. I have created indexes and set the binary search parameter to true 
(default), but the query is not faster. 
 
Here is how I set up indexing:


DROP INDEX IF EXISTS ipv4indexes ON ipv4geotable;
CREATE INDEX ipv4indexes 
ON TABLE ipv4geotable (StartIp, EndIp)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
IDXPROPERTIES  ('hive.index.compact.binary.search'='true');
 
ALTER INDEX ipv4indexes ON ipv4geotable REBUILD;
 
And here is my query:
 
DROP TABLE IF EXISTS ipv4table;
CREATE TABLE ipv4table AS
SELECT logon.IP, ipv4.Country
FROM 
(SELECT * FROM logontable WHERE isIpv4(IP)) logon
LEFT OUTER JOIN
(SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) 
WHERE ipv4.StartIp = logon.IP AND logon.IP = ipv4.EndIp;
 
What the query is doing is extracting an IP from logontable and finding in 
which range it lies within the geolocation table (which is sorted). When a 
range is found, the corresponding country is returned. The problem is that Hive 
goes through the whole table row by row rather than performing a smart search 
(ex: binary search). 
 
Any suggestions on how to speed things up? 
 
Thank you,
B

  

Re: Hive indexing optimization

2015-06-25 Thread John Pullokkaran
Set hive.optimize.index.filter=true;

Thanks
John

From: Bennie Leo tben...@hotmail.commailto:tben...@hotmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, June 25, 2015 at 5:48 PM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Hive indexing optimization


Hi,

I am attempting to optimize a query using indexing. My current query converts 
an ipv4 address to a country using a geolocation table. However, the 
geolocation table is fairly large and the query takes an impractical amount of 
time. I have created indexes and set the binary search parameter to true 
(default), but the query is not faster.

Here is how I set up indexing:

DROPINDEXIFEXISTS ipv4indexes ON ipv4geotable;
CREATEINDEX ipv4indexes
ONTABLE ipv4geotable (StartIp, EndIp)
AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITHDEFERREDREBUILD
IDXPROPERTIES ('hive.index.compact.binary.search'='true');

ALTERINDEX ipv4indexes ON ipv4geotable REBUILD;

And here is my query:

DROPTABLEIFEXISTS ipv4table;
CREATETABLE ipv4table AS
SELECT logon.IP, ipv4.Country
FROM
(SELECT * FROM logontable WHERE isIpv4(IP)) logon
LEFTOUTERJOIN
(SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP)
WHERE ipv4.StartIp = logon.IPANDlogon.IP = ipv4.EndIp;

What the query is doing is extracting an IP from logontable and finding in 
which range it lies within the geolocation table (which is sorted). When a 
range is found, the corresponding country is returned. The problem is that Hive 
goes through the whole table row by row rather than performing a smart search 
(ex: binary search).

Any suggestions on how to speed things up?

Thank you,
B


Re: hive -e run tez query error

2015-06-25 Thread Jeff Zhang
The drop table command between these 2 query fails due to permission issue.
If I solve the permission issue, the tez error will disappear.
But what confuse me is that I suppose the failure of drop table command
should not affect the following tez jobs. But from the hdfs-audit logs,
 tez_staging_directory is deleted between the first query and the second
query.  But besides the audit log, I don't see anywhere else about the
deleting of tez_staging_directory including hive log  yarn app log. Does
anyone know whether hive will delete the tez staging directory in this case
?


Here's the hive log and hdfs-audit log.

*hive.log*
2015-06-25 15:54:33,623 INFO  [main]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(732)) - 0: drop_table : db=TESTDEFAULT
tbl=tmp_recomm_prdt_detail0
2015-06-25 15:54:33,623 INFO  [main]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(358)) - ugi=lujian  ip=unknown-ip-addr
 cmd=drop_table : db=TESTDEFAULT tbl=tmp_recomm_prdt_detail0
2015-06-25 15:54:33,658 INFO  [main]: metastore.hivemetastoressimpl
(HiveMetaStoreFsImpl.java:deleteDir(41)) - deleting  hdfs://
yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0
2015-06-25 15:54:33,660 INFO  [main]: fs.TrashPolicyDefault
(TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration:
Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
2015-06-25 15:54:33,669 ERROR [main]: hive.log
(MetaStoreUtils.java:logAndThrowMetaException(1173)) - Got exception:
java.io.IOException Failed to move to trash: hdfs://
yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0
java.io.IOException: Failed to move to trash: hdfs://
yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0
at
org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160)
at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:109)
.. (remove the full stacktrace)
2015-06-25 15:54:33,670 ERROR [main]: hive.log
(MetaStoreUtils.java:logAndThrowMetaException(1174)) - Converting exception
to MetaException
2015-06-25 15:54:33,670 ERROR [main]: metastore.HiveMetaStore
(HiveMetaStore.java:deleteTableData(1557)) - Failed to delete table
directory: hdfs://
yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0
Got exception: java.io.IOException Failed to move to trash: hdfs://
yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0
2015-06-25 15:54:33,670 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=runTasks
start=1435218873582 end=1435218873670 duration=88
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,670 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=Driver.execute
start=1435218873582 end=1435218873670 duration=88
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: ql.Driver
(SessionState.java:printInfo(852)) - OK
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=releaseLocks
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=releaseLocks
start=1435218873671 end=1435218873671 duration=0
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=Driver.run
start=1435218873554 end=1435218873671 duration=117
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: CliDriver
(SessionState.java:printInfo(852)) - Time taken: 0.117 seconds
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=Driver.run
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=TimeToSubmit
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,671 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=compile
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,672 INFO  [main]: log.PerfLogger
(PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=parse
from=org.apache.hadoop.hive.ql.Driver
2015-06-25 15:54:33,672 INFO  [main]: parse.ParseDriver
(ParseDriver.java:parse(185)) - Parsing command:   create table
TESTDEFAULT.tmp_recomm_prdt_detail0(track_id bigint,cart_time
string,product_id bigint, session_id string,gu_id string,url string,referer
string,province_id bigint,page_type bigint, button_position
string,link_position string,ip string,m_session_id string,link_id bigint,
button_id bigint,algorithm_id bigint,from_value string,ref_value
string,step string,platform_type_id int,pm_info_id bigint,refer_page_value
string,url_page_value string)
2015-06-25 15:54:33,674 INFO  [main]: parse.ParseDriver
(ParseDriver.java:parse(206)) - Parse