[ https://issues.apache.org/jira/browse/SPARK-48423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856332#comment-17856332 ]
Chhavi Bansal commented on SPARK-48423: --------------------------------------- [~weichenxu123] sorry for tagging you, but you helped on another ticket SPARK-48463, Can you please let me know how I could get attention on this ticket. > Unable to write MLPipeline to blob storage using .option attribute > ------------------------------------------------------------------ > > Key: SPARK-48423 > URL: https://issues.apache.org/jira/browse/SPARK-48423 > Project: Spark > Issue Type: Bug > Components: ML, MLlib, Spark Core > Affects Versions: 3.4.3 > Reporter: Chhavi Bansal > Priority: Blocker > > I am trying to write mllib pipeline with a series of stages set in it to > azure blob storage giving relevant write parameters, but it still complains > of `fs.azure.account.key` not being found in the configuration. > Sharing the code. > {code:java} > val spark = > SparkSession.builder().appName("main").master("local[4]").getOrCreate() > import spark.implicits._ > val df = spark.createDataFrame(Seq( > (0L, "a b c d e spark"), > (1L, "b d") > )).toDF("id", "text") > val si = new StringIndexer().setInputCol("text").setOutputCol("IND_text") > val pipelinee = new Pipeline().setStages(Array(si)) > val pipelineModel = pipelinee.fit(df) > val path = BLOB_STORAGE_PATH > pipelineModel.write > .option("spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net", > "__").option("fs.azure.account.key.<account_name>.dfs.core.windows.net", > "__").option("fs.azure.account.oauth2.client.endpoint.<account_name>.dfs.core.windows.net", > > "__").option("fs.azure.account.oauth2.client.id.<account_name>.dfs.core.windows.net", > > "__").option("fs.azure.account.auth.type.<account_name>.dfs.core.windows.net","__").option("fs.azure.account.oauth2.client.secret.<account_name>.dfs.core.windows.net", > > "__").option("fs.azure.account.oauth.provider.type.<account_name>.dfs.core.windows.net", > "__") > .save(path){code} > > The error that i get is > {code:java} > Failure to initialize configuration > Caused by: InvalidConfigurationValueException: Invalid configuration value > detected for fs.azure.account.key > at > org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51) > at > org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:548) > at > org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1449){code} > This shows that even though the key,value of > {code:java} > spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net {code} > is being sent via option param, but is not being set internally. > > while this works only if i explicitly set the values in the > {code:java} > spark.conf.set(key,value) {code} > which might be problematic for a multi-tenant solution, which can be using > the same spark context. > one other observation is > {code:java} > df.write.option(key1,value1).option(key2,value2).save(path) {code} > fails with same key error while, > {code:java} > map = Map(key1->value1, key2->value2) > df.write.options(map).save(path) {code} > works.. > > Help required on: Similar to how dataframes `options` > {code:java} > df.write.options(Map<key,value>) {code} > helps to set the configuration, the *.option(key1, value1)* should also work > to write to azure blob storage. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org