[
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15676999#comment-15676999
]
Sourav Mazumder commented on BAHIR-67:
--------------------------------------
Hi Steve,
Few followup Qs to get more clarity on your comment -
1. Are you suggesting use of the hadoop-hdfs/hadoop-hdfs-client jar so that
we can use apis as "webhdfs://<HOST>:<HTTP_PORT>/<PATH>" instead of "http://
<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=..." ? (I'm referring the section
FileSystem URIs vs HTTP URLs in
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#FileSystem_URIs_vs_HTTP_URLs)
?
2. Are you suggesting to use this in the main code or in integration test
code ?
Regards,
Sourav
On Fri, Nov 18, 2016 at 6:34 AM, Steve Loughran (JIRA) <[email protected]>
> WebHDFS Data Source for Spark SQL
> ---------------------------------
>
> Key: BAHIR-67
> URL: https://issues.apache.org/jira/browse/BAHIR-67
> Project: Bahir
> Issue Type: New Feature
> Components: Spark SQL Data Sources
> Reporter: Sourav Mazumder
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> In today's world of Analytics many use cases need capability to access data
> from multiple remote data sources in Spark. Though Spark has great
> integration with local Hadoop cluster it lacks heavily on capability for
> connecting to a remote Hadoop cluster. However, in reality not all data of
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs
> api.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)