Yuqing Xue created SPARK-46314: ---------------------------------- Summary: If Hadoop is not installed and configured, can the Spark cluster read and write OBS in standalone mode? Key: SPARK-46314 URL: https://issues.apache.org/jira/browse/SPARK-46314 Project: Spark Issue Type: IT Help Components: Connect, Input/Output, PySpark Affects Versions: 3.4.1 Environment: Python3.8 Reporter: Yuqing Xue
If Hadoop is not deployed, PySpark APIs read data from OBS buckets and convert the data to RDD. How can I achieve it? The following code reports an error: No FileSystem for scheme "obs",Can Spark read and write OBS without Hadoop installation and configuration? {code:java} // code placeholder from pyspark import SparkConf from pyspark.sql import SparkSession conf = SparkConf() conf.set("spark.app.name", "read and write OBS") conf.set("spark.security.credentials.hbase.enabled", "true") conf.set("spark.hadoop.fs.obs.access.key", ak) conf.set("spark.hadoop.fs.obs.secret.key", sk) conf.set("spark.hadoop.fs.obs.endpoint", "http://xxx") spark = SparkSession.builder.config(conf=conf).getOrCreate() df = spark.read.json('obs://bucket_name/xxx.json') df.coalesce(2).write.json("obs://bucket_name/", "overwrite") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org