[
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573088#comment-15573088
]
Steve Loughran commented on BAHIR-67:
-------------------------------------
this is very much a sibling of the SPARK-7481 patch where I've been trying to
add a module for dependencies and tests. ignoring the problem of getting a
webhdfs JAR into SPARK_HOME/jars, the tests in that module should cover what's
needed, both in terms of operations (basic IO) and the more minimal
classpath/config checking.
I think you could bring up minidfs cluster in webhdfs mode, so have a
functional test of things
> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> ---------------------------------------------------------------------------
>
> Key: BAHIR-67
> URL: https://issues.apache.org/jira/browse/BAHIR-67
> Project: Bahir
> Issue Type: Improvement
> Components: Spark SQL Data Sources
> Reporter: Sourav Mazumder
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> In today's world of Analytics many use cases need capability to access data
> from multiple remote data sources in Spark. Though Spark has great
> integration with local Hadoop cluster it lacks heavily on capability for
> connecting to a remote Hadoop cluster. However, in reality not all data of
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs
> api.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)