[ 
https://issues.apache.org/jira/browse/HADOOP-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477594
 ] 

Milind Bhandarkar commented on HADOOP-1053:
-------------------------------------------

I agree with the points that David has made, i.e. the description of this Jira 
issue should be modified to remove the "hadoop-dependency" as a goal, but 
rather making hadoop record i/o functionally modular (thanks for those poignant 
words, David. After all, proper usage of words do matter for easing objections 
and reaching consensus).

In the context of this discussion, I would like to know the opinion of 
hadoop-committers: if a significant user-base had asked for independent usage 
of let's say only hadoop DFS independent of the hadoop map-reduce framework. 
Would they have agreed to separate them so that they can get a huge base of 
users? I agree that this may seem a hypothetical question, but I have been 
asked this question by a lot of people, and so I can assure you all that it is 
indeed not a hypothetical question.

The particular question that I was asked by an aspiring web 2.0 company was 
this: I have been following the hadoop-dev list for the last three months, and 
I have noticed that while the dfs side of hadoop is adding features, the 
map-reduce side of things have been fixing serious bugs. In any case, I do not 
need map-reduce, but believe that dfs would be a great feature to have. Is 
there a way to just use dfs in my project without the rest of the hadoop?

My answer to that was, yes, of course. DFS is functionally modular (p.s. they 
do not want sequncefile. They have their own ideas about storing key-value 
pairs.). Was my answer correct? Should I be suggesting them instead, that "No, 
you have to use map-reduce framework even if you do not need it?" (P.S. I have 
suggested them a way to repackage the jar so that they can use only DFS, and 
watch only the dfs component traffic on hadoop-dev.)

So, now this comes back to me in the record i/o context. Why can't I say the 
same to Hadoop record I/O users ? (About three users, two  startup founders 
that I happen to know, have asked me this.)

But long-term vision aside, I believe this patch is an important step ahead in 
this on-going saga. It at least reaches mid-way. Those looking at the generated 
code do not get puzzled by why there are two ways of serializing the record in 
packed binary serialization format, when the Record.serialize suffices for all 
current and future formats.

> Make Record I/O usable from independent of Hadoop
> -------------------------------------------------
>
>                 Key: HADOOP-1053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1053
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.11.2
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.13.0
>
>         Attachments: jute-patch.txt
>
>
> This issue has been created to separate one proposal originally included in 
> HADOOP-941, for which no consensus could be reached. For earlier discussion 
> about the issue, please see HADOOP-941.
> I will summarize the proposal here.  We need to provide a way for some users 
> who want to use record I/O framework outside of Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to