[ https://issues.apache.org/jira/browse/HIVE-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252677#comment-14252677 ]
Jason Dere commented on HIVE-9163: ---------------------------------- {quote} + if (!isSrcLocal) { + String[] srcfSplits = srcf.getName().split(":"); + String[] destfSplits = destf.getName().split(":"); + boolean srcdestSameFS = srcfSplits[0].equals(destfSplits[0]); {quote} I think you need more than this - this will just get the scheme ("wasb", "hdfs"), but hdfs://host1:9000 should also be considered a different filesystem than hdfs://host2:8888, even though they have the same FS scheme. How about using srcf.getFileSystem()/destf.getFileSystem() and checking the FS objects for equality? One issue with doing a local copy of the files before copying to the destination FS is that if the source files are large (which could be possible for a temp table), this could fill all the space in the local filesystem. Is there a way to invoke distcp or stream the data using FS.open()/FS.create() and copying the bytes rather than saving locally? Also, is it possible to add a test case for this? Create 2 MiniMR FS instances, use HIve.moveFile() to move the file from one FS to the other. > create temporary table will fail with wasb storage because MoveTask.moveFile > tries to move data to hdfs dir instead of wasb dir > ------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-9163 > URL: https://issues.apache.org/jira/browse/HIVE-9163 > Project: Hive > Issue Type: Bug > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-9163.1.patch, HIVE-9163.2.patch > > > {quote} > create temporary table s10k stored as orc as select * from studenttab10k; > create temporary table v10k as select * from votertab10k; > select registration > from s10k s join v10k v > on (s.name = v.name) join studentparttab30k p > on (p.name = v.name) > where s.age < 25 and v.age < 25 and p.age < 25; > {quote} > It fails because it tries to move data to hdfs dir instead of wasb dir: > {quote} > 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name > hive.optimize.mapjoin.mapreduce does not exist > 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not > exist > 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not > exist > 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name > hive.server2.map.fair.scheduler.queue does not exist > 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name > hive.auto.convert.sortmerge.join.noconditionaltask does not exist > Logging initialized using configuration in > file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger > Binder.class] > SLF4J: Found binding in > [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger > Binder.class] > SLF4J: Found binding in > [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class > ] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1418224548060_0070, Tracking URL = > http://headnode0:9014/proxy/application_1418224548060_0070/ > Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job > -kill job_1418224548060_0070 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0 > 2014-12-11 00:25:39,949 Stage-1 map = 0%, reduce = 0% > 2014-12-11 00:25:52,603 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 4.421 sec > MapReduce Total cumulative CPU time: 4 seconds 421 msec > Ended Job = job_1418224548060_0070 > Stage-3 is selected by condition resolver. > Stage-2 is filtered out by condition resolver. > Stage-4 is filtered out by condition resolver. > Moving data to: > wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv > e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 > Moving data to: > hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 > Failed with exception Unable to move source > wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c46 > 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 > to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c > 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 4.421 sec HDFS Read: 0 HDFS Write: > 0 SUCCESS > Total MapReduce CPU Time Spent: 4 seconds 421 msec > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)