Re: join 2 tables located on different clusters
I tried to reproduce Wrong FS issue in several hive branches branch-0.14 - works branch-1.0 - works branch-1.1 - throws exception Looks like the error was introduced in 1.1.0 by the following https://issues.apache.org/jira/browse/HIVE-9264 I opened new JIRA for the issue https://issues.apache.org/jira/browse/HIVE-6 On Wed, Jun 24, 2015 at 4:08 PM, Alexander Pivovarov apivova...@gmail.com wrote: I tried on local hadoop/hive instance (hive is the latest from master branch) mydev is ha alias to remote ha name node. $ hadoop fs -ls hdfs://mydev/tmp/et1 Found 1 items -rw-r--r-- 3 myapp hadoop 16 2015-06-24 16:05 hdfs://mydev/tmp/et1/et1file $ hive hive CREATE TABLE et1 ( a string ) stored as textfile LOCATION 'hdfs://mydev/tmp/et1'; hive select * from et1; 15/06/24 16:01:08 [main]: ERROR parse.CalcitePlanner: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906) at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866) ... 26 more FAILED: SemanticException Unable to determine if hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020 15/06/24 16:01:08 [main]: ERROR ql.Driver: FAILED: SemanticException Unable to determine if hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020 org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1850) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057) at
Re: ApacheCON EU HBase Track Submissions
On Thu, Jun 25, 2015 at 12:13 PM, Owen O'Malley omal...@apache.org wrote: Actually, Apache: Big Data Europe CFP closes 10 July. The CFP is: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp My message was regarding a track for ApacheCON EU Core, for which the CFP closes July 1. Communities are not given the opportunity to organize tracks for the big data conference. On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote: Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
Cost based optimization
Hello Everyone, A quick question on the cost-based optimization module in Hive. Does the latest version support query plan generation with alternate join orders ? Thanks Raajay
EXPORTing multiple partitions
Using Hive .13, I would like to export multiple partitions of a table, something conceptually like: EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’ Is there any way to accomplish this? Brian
Re: EXPORTing multiple partitions
Answering my own question: create table foo_copy like foo; insert into foo_copy partition (id) select * from foo where id in (1,2,3); export table foo_copy to ‘path’; drop table foo_copy; It would be nice if export could do this automatically, though. Brian On Jun 25, 2015, at 11:34 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Using Hive .13, I would like to export multiple partitions of a table, something conceptually like: EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’ Is there any way to accomplish this? Brian
ApacheCON EU HBase Track Submissions
Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
Re: ApacheCON EU HBase Track Submissions
Actually, Apache: Big Data Europe CFP closes 10 July. The CFP is: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote: Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
Re: Cost based optimization
Hive does look in to alternate join orders and pick the best plan that minimizes cost. It uses a greedy algorithm to enumerate plan space. Thanks John From: Raajay raaja...@gmail.commailto:raaja...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, June 25, 2015 at 2:30 PM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Cost based optimization Hello Everyone, A quick question on the cost-based optimization module in Hive. Does the latest version support query plan generation with alternate join orders ? Thanks Raajay
Re: EXPORTing multiple partitions
Hi Brian, If you think that is useful, please feel free to create a JIRA requesting for it. Thanks, Xuefu On Thu, Jun 25, 2015 at 10:36 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Answering my own question: create table foo_copy like foo; insert into foo_copy partition (id) select * from foo where id in (1,2,3); export table foo_copy to ‘path’; drop table foo_copy; It would be nice if export could do this automatically, though. Brian On Jun 25, 2015, at 11:34 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Using Hive .13, I would like to export multiple partitions of a table, something conceptually like: EXPORT TABLE foo PARTITION (id=1,2,3) to ‘path’ Is there any way to accomplish this? Brian
Hive indexing optimization
Hi, I am attempting to optimize a query using indexing. My current query converts an ipv4 address to a country using a geolocation table. However, the geolocation table is fairly large and the query takes an impractical amount of time. I have created indexes and set the binary search parameter to true (default), but the query is not faster. Here is how I set up indexing: DROP INDEX IF EXISTS ipv4indexes ON ipv4geotable; CREATE INDEX ipv4indexes ON TABLE ipv4geotable (StartIp, EndIp) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES ('hive.index.compact.binary.search'='true'); ALTER INDEX ipv4indexes ON ipv4geotable REBUILD; And here is my query: DROP TABLE IF EXISTS ipv4table; CREATE TABLE ipv4table AS SELECT logon.IP, ipv4.Country FROM (SELECT * FROM logontable WHERE isIpv4(IP)) logon LEFT OUTER JOIN (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) WHERE ipv4.StartIp = logon.IP AND logon.IP = ipv4.EndIp; What the query is doing is extracting an IP from logontable and finding in which range it lies within the geolocation table (which is sorted). When a range is found, the corresponding country is returned. The problem is that Hive goes through the whole table row by row rather than performing a smart search (ex: binary search). Any suggestions on how to speed things up? Thank you, B
Re: Hive indexing optimization
Set hive.optimize.index.filter=true; Thanks John From: Bennie Leo tben...@hotmail.commailto:tben...@hotmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, June 25, 2015 at 5:48 PM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Hive indexing optimization Hi, I am attempting to optimize a query using indexing. My current query converts an ipv4 address to a country using a geolocation table. However, the geolocation table is fairly large and the query takes an impractical amount of time. I have created indexes and set the binary search parameter to true (default), but the query is not faster. Here is how I set up indexing: DROPINDEXIFEXISTS ipv4indexes ON ipv4geotable; CREATEINDEX ipv4indexes ONTABLE ipv4geotable (StartIp, EndIp) AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITHDEFERREDREBUILD IDXPROPERTIES ('hive.index.compact.binary.search'='true'); ALTERINDEX ipv4indexes ON ipv4geotable REBUILD; And here is my query: DROPTABLEIFEXISTS ipv4table; CREATETABLE ipv4table AS SELECT logon.IP, ipv4.Country FROM (SELECT * FROM logontable WHERE isIpv4(IP)) logon LEFTOUTERJOIN (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) WHERE ipv4.StartIp = logon.IPANDlogon.IP = ipv4.EndIp; What the query is doing is extracting an IP from logontable and finding in which range it lies within the geolocation table (which is sorted). When a range is found, the corresponding country is returned. The problem is that Hive goes through the whole table row by row rather than performing a smart search (ex: binary search). Any suggestions on how to speed things up? Thank you, B
Re: hive -e run tez query error
The drop table command between these 2 query fails due to permission issue. If I solve the permission issue, the tez error will disappear. But what confuse me is that I suppose the failure of drop table command should not affect the following tez jobs. But from the hdfs-audit logs, tez_staging_directory is deleted between the first query and the second query. But besides the audit log, I don't see anywhere else about the deleting of tez_staging_directory including hive log yarn app log. Does anyone know whether hive will delete the tez staging directory in this case ? Here's the hive log and hdfs-audit log. *hive.log* 2015-06-25 15:54:33,623 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(732)) - 0: drop_table : db=TESTDEFAULT tbl=tmp_recomm_prdt_detail0 2015-06-25 15:54:33,623 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(358)) - ugi=lujian ip=unknown-ip-addr cmd=drop_table : db=TESTDEFAULT tbl=tmp_recomm_prdt_detail0 2015-06-25 15:54:33,658 INFO [main]: metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - deleting hdfs:// yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0 2015-06-25 15:54:33,660 INFO [main]: fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes. 2015-06-25 15:54:33,669 ERROR [main]: hive.log (MetaStoreUtils.java:logAndThrowMetaException(1173)) - Got exception: java.io.IOException Failed to move to trash: hdfs:// yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0 java.io.IOException: Failed to move to trash: hdfs:// yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0 at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160) at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:109) .. (remove the full stacktrace) 2015-06-25 15:54:33,670 ERROR [main]: hive.log (MetaStoreUtils.java:logAndThrowMetaException(1174)) - Converting exception to MetaException 2015-06-25 15:54:33,670 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:deleteTableData(1557)) - Failed to delete table directory: hdfs:// yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0 Got exception: java.io.IOException Failed to move to trash: hdfs:// yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/warehouse/testdefault.db/tmp_recomm_prdt_detail0 2015-06-25 15:54:33,670 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=runTasks start=1435218873582 end=1435218873670 duration=88 from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,670 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=Driver.execute start=1435218873582 end=1435218873670 duration=88 from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: ql.Driver (SessionState.java:printInfo(852)) - OK 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=releaseLocks start=1435218873671 end=1435218873671 duration=0 from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=Driver.run start=1435218873554 end=1435218873671 duration=117 from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: CliDriver (SessionState.java:printInfo(852)) - Time taken: 0.117 seconds 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,671 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,672 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver 2015-06-25 15:54:33,672 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: create table TESTDEFAULT.tmp_recomm_prdt_detail0(track_id bigint,cart_time string,product_id bigint, session_id string,gu_id string,url string,referer string,province_id bigint,page_type bigint, button_position string,link_position string,ip string,m_session_id string,link_id bigint, button_id bigint,algorithm_id bigint,from_value string,ref_value string,step string,platform_type_id int,pm_info_id bigint,refer_page_value string,url_page_value string) 2015-06-25 15:54:33,674 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(206)) - Parse