Yuqing Xue created SPARK-46314:
----------------------------------

             Summary: If Hadoop is not installed and configured, can the Spark 
cluster read and write OBS in standalone mode?
                 Key: SPARK-46314
                 URL: https://issues.apache.org/jira/browse/SPARK-46314
             Project: Spark
          Issue Type: IT Help
          Components: Connect, Input/Output, PySpark
    Affects Versions: 3.4.1
         Environment: Python3.8
            Reporter: Yuqing Xue


If Hadoop is not deployed, PySpark APIs read data from OBS buckets and convert 
the data to RDD. How can I achieve it?

The following code reports an error: No FileSystem for scheme "obs",Can Spark 
read and write OBS without Hadoop installation and configuration?
{code:java}
// code placeholder
from pyspark import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf()
conf.set("spark.app.name", "read and write OBS")
conf.set("spark.security.credentials.hbase.enabled", "true")
conf.set("spark.hadoop.fs.obs.access.key", ak)
conf.set("spark.hadoop.fs.obs.secret.key", sk)
conf.set("spark.hadoop.fs.obs.endpoint", "http://xxx";)
spark = SparkSession.builder.config(conf=conf).getOrCreate()

df = spark.read.json('obs://bucket_name/xxx.json')
df.coalesce(2).write.json("obs://bucket_name/", "overwrite") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to