[
https://issues.apache.org/jira/browse/BAHIR-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672003#comment-15672003
]
ASF GitHub Bot commented on BAHIR-75:
-------------------------------------
Github user ckadner commented on a diff in the pull request:
https://github.com/apache/bahir/pull/27#discussion_r88352457
--- Diff:
datasource-webhdfs/src/main/scala/org/apache/bahir/datasource/webhdfs/DefaultSource.scala
---
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.bahir.datasource.webhdfs
+
+import java.text.SimpleDateFormat
+import java.sql.{Timestamp, Date}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.sql.types.{DateType, TimestampType}
+
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
+
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types.StructType
+import scala.annotation.switch
+
+import org.apache.bahir.datasource.webhdfs.util._
+import org.apache.bahir.datasource.webhdfs.csv._
+
+/**
+ * This class contains functions for reading/writing data from/to remote
webhdfs server in Spark DataSource
+ * This function is written in line with the DataSource function in
com.databricks.spark.csv.
+*/
+
+
+class DefaultSource
+ extends RelationProvider
+ with SchemaRelationProvider
+ with CreatableRelationProvider
+ with DataSourceRegister {
+
+ override def shortName() : String = "webhdfs"
+
+ private def checkPath(parameters: Map[String, String]): String = {
+ parameters.getOrElse("path", sys.error("'path' must be
specified "))
+ }
+
+ /**
+ * Creates a new relation for data store in CSV given parameters.
+ * Parameters have to include 'path' and optionally 'delimiter',
'quote', and 'header'
+ */
+ override def createRelation(
+ sqlContext: SQLContext,
+ parameters: Map[String, String]): BaseRelation = {
+ createRelation(sqlContext, parameters, null)
+ }
+
+ /**
+ * Creates a new relation for data store in CSV given parameters and
user supported schema.
+ * Parameters have to include 'path' and optionally 'delimiter',
'quote', and 'header'
+ */
--- End diff --
yup, coded for `*.csv` only for now ...
@sourav-mazumder do you plan on adding support for other file formats like
`*.txt` and `*.json`?
> Initial Code Delivery
> ---------------------
>
> Key: BAHIR-75
> URL: https://issues.apache.org/jira/browse/BAHIR-75
> Project: Bahir
> Issue Type: Sub-task
> Components: Spark SQL Data Sources
> Reporter: Sourav Mazumder
> Original Estimate: 504h
> Remaining Estimate: 504h
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)