[jira] [Commented] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir

Jason Dere (JIRA) Thu, 18 Dec 2014 16:50:38 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252677#comment-14252677
 ]


Jason Dere commented on HIVE-9163:
----------------------------------

{quote}
+      if (!isSrcLocal) {         
+        String[] srcfSplits = srcf.getName().split(":");
+        String[] destfSplits = destf.getName().split(":");
+        boolean srcdestSameFS = srcfSplits[0].equals(destfSplits[0]);
{quote}

I think you need more than this - this will just get the scheme ("wasb", 
"hdfs"), but hdfs://host1:9000 should also be considered a different filesystem 
than hdfs://host2:8888, even though they have the same FS scheme. How about 
using srcf.getFileSystem()/destf.getFileSystem() and checking the FS objects 
for equality?

One issue with doing a local copy of the files before copying to the 
destination FS is that if the source files are large (which could be possible 
for a temp table), this could fill all the space in the local filesystem. Is 
there a way to invoke distcp or  stream the data using FS.open()/FS.create() 
and copying the bytes rather than saving locally?

Also, is it possible to add a test case for this? Create 2 MiniMR FS instances, 
use HIve.moveFile() to move the file from one FS to the other.

> create temporary table will fail with wasb storage because MoveTask.moveFile 
> tries to move data to hdfs dir instead of wasb dir
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9163
>                 URL: https://issues.apache.org/jira/browse/HIVE-9163
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-9163.1.patch, HIVE-9163.2.patch
>
>
> {quote}
> create temporary table s10k stored as orc as select * from studenttab10k;
>                     create temporary table v10k as select * from votertab10k;
>                     select registration
>                     from s10k s join v10k v
>                         on (s.name = v.name) join studentparttab30k p
>                         on (p.name = v.name)
>                     where s.age < 25 and v.age < 25 and p.age < 25;
> {quote}
> It fails because it tries to move data to hdfs dir instead of wasb dir:
> {quote}
> 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name 
> hive.optimize.mapjoin.mapreduce does not exist
> 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not 
> exist
> 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not 
> exist
> 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name 
> hive.server2.map.fair.scheduler.queue does not exist
> 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name 
> hive.auto.convert.sortmerge.join.noconditionaltask does not exist
> Logging initialized using configuration in 
> file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger
> Binder.class]
> SLF4J: Found binding in 
> [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger
> Binder.class]
> SLF4J: Found binding in 
> [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class
> ]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1418224548060_0070, Tracking URL = 
> http://headnode0:9014/proxy/application_1418224548060_0070/
> Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job  
> -kill job_1418224548060_0070
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2014-12-11 00:25:39,949 Stage-1 map = 0%,  reduce = 0%
> 2014-12-11 00:25:52,603 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 
> 4.421 sec
> MapReduce Total cumulative CPU time: 4 seconds 421 msec
> Ended Job = job_1418224548060_0070
> Stage-3 is selected by condition resolver.
> Stage-2 is filtered out by condition resolver.
> Stage-4 is filtered out by condition resolver.
> Moving data to: 
> wasb://[email protected]/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv
> e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001
> Moving data to: 
> hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063
> Failed with exception Unable to move source 
> wasb://[email protected]/hive/scratch/hadoopqa/008c3436-c46
> 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001
>  to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c
> 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 4.421 sec   HDFS Read: 0 HDFS Write: 
> 0 SUCCESS
> Total MapReduce CPU Time Spent: 4 seconds 421 msec
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir

Reply via email to