[
https://issues.apache.org/jira/browse/HADOOP-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477594
]
Milind Bhandarkar commented on HADOOP-1053:
-------------------------------------------
I agree with the points that David has made, i.e. the description of this Jira
issue should be modified to remove the "hadoop-dependency" as a goal, but
rather making hadoop record i/o functionally modular (thanks for those poignant
words, David. After all, proper usage of words do matter for easing objections
and reaching consensus).
In the context of this discussion, I would like to know the opinion of
hadoop-committers: if a significant user-base had asked for independent usage
of let's say only hadoop DFS independent of the hadoop map-reduce framework.
Would they have agreed to separate them so that they can get a huge base of
users? I agree that this may seem a hypothetical question, but I have been
asked this question by a lot of people, and so I can assure you all that it is
indeed not a hypothetical question.
The particular question that I was asked by an aspiring web 2.0 company was
this: I have been following the hadoop-dev list for the last three months, and
I have noticed that while the dfs side of hadoop is adding features, the
map-reduce side of things have been fixing serious bugs. In any case, I do not
need map-reduce, but believe that dfs would be a great feature to have. Is
there a way to just use dfs in my project without the rest of the hadoop?
My answer to that was, yes, of course. DFS is functionally modular (p.s. they
do not want sequncefile. They have their own ideas about storing key-value
pairs.). Was my answer correct? Should I be suggesting them instead, that "No,
you have to use map-reduce framework even if you do not need it?" (P.S. I have
suggested them a way to repackage the jar so that they can use only DFS, and
watch only the dfs component traffic on hadoop-dev.)
So, now this comes back to me in the record i/o context. Why can't I say the
same to Hadoop record I/O users ? (About three users, two startup founders
that I happen to know, have asked me this.)
But long-term vision aside, I believe this patch is an important step ahead in
this on-going saga. It at least reaches mid-way. Those looking at the generated
code do not get puzzled by why there are two ways of serializing the record in
packed binary serialization format, when the Record.serialize suffices for all
current and future formats.
> Make Record I/O usable from independent of Hadoop
> -------------------------------------------------
>
> Key: HADOOP-1053
> URL: https://issues.apache.org/jira/browse/HADOOP-1053
> Project: Hadoop
> Issue Type: Improvement
> Components: record
> Affects Versions: 0.11.2
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Fix For: 0.13.0
>
> Attachments: jute-patch.txt
>
>
> This issue has been created to separate one proposal originally included in
> HADOOP-941, for which no consensus could be reached. For earlier discussion
> about the issue, please see HADOOP-941.
> I will summarize the proposal here. We need to provide a way for some users
> who want to use record I/O framework outside of Hadoop.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.