[jira] [Updated] (BIGTOP-358) now that hadoop packages have been split we have to update the dependencies on the downstream packages

Roman Shaposhnik (Updated) (JIRA) Fri, 27 Jan 2012 11:52:35 -0800

     [ 
https://issues.apache.org/jira/browse/BIGTOP-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Shaposhnik updated BIGTOP-358:
------------------------------------

    Attachment: bigtop.dot
                bigtop.png

Attached dot and png files are what I figured so far (rectangle boxes represent 
capabilities that will be provided by actual packages and dotted lines 
represent "optional/recommended" dependencies). Now, I still have a few 
concerns:

1. I think it is pretty clear by now that mapreduce dependency has to be on a 
capability, not an actual package (and then we'll have hadoop-mapreduce 
"Provide: " that capability. The question is whether we are ready to do the 
same with hadoop-hdfs and what those capabilities should be called (my proposal 
is to call them "mapreduce" and "dfs" respectively and make the actual packages 
hadoop-mapreduce and hadoop-hdfs provide those capabilities for now).

2. For pig, hive,sqoop and mahout the real hard dependency is mapreduce. The 
dependency on dfs is an optional one (they can run just fine in local mode 
without ever talking to HDFS). The question is -- what's the best mechanism to 
"recommend" dfs? I know we can do that with debian packages (Recommends tag), 
but what about RPM? Finally, are we doing the right thing here by treating dfs 
as an optional dependency or should we enforce it to begin with?

3. HBase is a weird case here -- at the Maven level they package all of their 
dependencies (optional or not) into lib/* they end up with a whole bunch of 
jars there that we're currently replacing by symlinks. Not all of those 
dependencies are needed by HBase in all cases
(in fact the only hard dependency there is Zookeeper) but having dangling 
symlinks doesn't seem appealing. The question is -- what do we do?
                
> now that hadoop packages have been split we have to update the dependencies 
> on the downstream packages
> ------------------------------------------------------------------------------------------------------
>
>                 Key: BIGTOP-358
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-358
>             Project: Bigtop
>          Issue Type: Bug
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>         Attachments: bigtop.dot, bigtop.png
>
>
> This is actually slightly more complicated than it sounds: it is pretty 
> straightforward to replace a dependency on hadoop with a dependency on 
> hadoop-mapreduce it is less clear what to do with HDFS. Strictly speaking 
> HDFS is not a hard dependency (one can run on a local filesystems just fine).
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (BIGTOP-358) now that hadoop packages have been split we have to update the dependencies on the downstream packages

Reply via email to