[ 
https://issues.apache.org/jira/browse/HDFS-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909368#comment-13909368
 ] 

wangmeng commented on HDFS-5996:
--------------------------------

These  days  I'm  doing a reserach on Spark and  shark  and   compared    with  
mapreduce.   I  find  spark/shark   have some advantages   over  mapreduce/Hive 
  for  spark's  im-menory computing . As  Hegel  says : What is rational is 
actual and what is actual is rational.     I  think  just now  spark  can not  
completely replace  mapreduce Frame ,  mapreduce  has its addvantage.
 But  now I can not  find  an  advantage   that   mapreduce   has  over  spark. 
 Expect  for  that  spark  is a  new  system and is not  mature   than 
mapreduce ,  does  the mapreduce's   fault  tolerance  is  better than RDD's 
lineage?      So ,can you   analysis it   ?   thanks.




--

Best      Regards
Name:    Wang Meng (Boy)
Major:     Software Engineering ---Java ,Shell ,Python , Linux ,Big Data 
,Hadoop ,Hive, Sql On Hadoop , Warehouse
Degree:  Master
E-mail:   [email protected]   [email protected]
Tel:         13141202303(BeiJing)   18818272832(ShangHai)
GitHub:    https://github.com/sjtufighter







> hadoop 1.1.2.  hdfs  write bug 
> -------------------------------
>
>                 Key: HDFS-5996
>                 URL: https://issues.apache.org/jira/browse/HDFS-5996
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fuse-dfs
>    Affects Versions: 1.1.2
>         Environment: one master and  three slave ,all  of  them are normal
>            Reporter: wangmeng
>             Fix For: 1.1.2
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
>       I am  a student from China ,my research  is Hive  data storage on 
> hadoop .There is a  hdfs-write bug  when I used  sql : insert overwrite table 
>  wangmeng  select  *    from testTable (this  sql  is  translated   into N 
> map( no Reduce)  jobs,each map .corresponding to  a  HDFS  file  output On 
> disk. )  No  matter  what value N is , there will  always  exists  some   
> DfsdataoutputStream buffer  can  not  write to disk at  last ,such as N=160 
> files ,then  there my  be  about 5  write-faliure  files .,the  
> write-failured  hdfs--file size on disk  is always 0 bytes  rather than a 
> value which  is between 0  and zhe correct  size. .There does   not  have  
> any exceptions to throw . and the  HDFS WRITTEN  statistical data   is  
> absolutely correct .
>        When  I  debug , I find  those  write-failed DFS-buffer   own  
> absolutely  correct values on   its  buffer ,but the  buffer  can  not write 
> to disk  at last although I use Dfsdataoutputstream.flush()  , 
> Dfsdataoutputstream close() .
>        .I can not find the reason those  dfs-buffer  can not  write success.  
> Now I choose  a method to avoide this problem   by using  a temporary  file : 
> for  example , if  the  DFS-buffer  will write to its destination FINAL, now 
> I will let this DFS-buffer  write to a temporary file TEM  first ,and  then  
> I   move  the  TEM  data  to the destination just  by change the   hdfs-- 
> file path.  This method can avoid  the DFS-buffer  write -failure .Now   I   
> want  to   fix this problem  radically ,  so How can I patch  my codes about  
> this  problem  and  is  there  anything I  can do ? Many  Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to