[jira] [Commented] (SPARK-48423) Unable to write MLPipeline to blob storage using .option attribute

Chhavi Bansal (Jira) Wed, 19 Jun 2024 11:45:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-48423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856332#comment-17856332
 ]


Chhavi Bansal commented on SPARK-48423:
---------------------------------------

[~weichenxu123]  sorry for tagging you, but you helped on another ticket 
SPARK-48463, Can you please let me know how I could get attention on this 
ticket.

> Unable to write MLPipeline to blob storage using .option attribute
> ------------------------------------------------------------------
>
>                 Key: SPARK-48423
>                 URL: https://issues.apache.org/jira/browse/SPARK-48423
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib, Spark Core
>    Affects Versions: 3.4.3
>            Reporter: Chhavi Bansal
>            Priority: Blocker
>
> I am trying to write mllib pipeline with a series of stages set in it to 
> azure blob storage giving relevant write parameters, but it still complains 
> of `fs.azure.account.key` not being found in the configuration.
> Sharing the code.
> {code:java}
> val spark = 
> SparkSession.builder().appName("main").master("local[4]").getOrCreate()
> import spark.implicits._
> val df = spark.createDataFrame(Seq(
>   (0L, "a b c d e spark"),
>   (1L, "b d")
> )).toDF("id", "text") 
> val si = new StringIndexer().setInputCol("text").setOutputCol("IND_text")
> val pipelinee = new Pipeline().setStages(Array(si))
> val pipelineModel = pipelinee.fit(df)
> val path = BLOB_STORAGE_PATH
> pipelineModel.write
> .option("spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net",
>   "__").option("fs.azure.account.key.<account_name>.dfs.core.windows.net", 
> "__").option("fs.azure.account.oauth2.client.endpoint.<account_name>.dfs.core.windows.net",
>  
> "__").option("fs.azure.account.oauth2.client.id.<account_name>.dfs.core.windows.net",
>  
> "__").option("fs.azure.account.auth.type.<account_name>.dfs.core.windows.net","__").option("fs.azure.account.oauth2.client.secret.<account_name>.dfs.core.windows.net",
>  
> "__").option("fs.azure.account.oauth.provider.type.<account_name>.dfs.core.windows.net",
>  "__")
> .save(path){code}
>  
> The error that i get is 
> {code:java}
>  Failure to initialize configuration
> Caused by: InvalidConfigurationValueException: Invalid configuration value 
> detected for fs.azure.account.key
> at 
> org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51)
>     at 
> org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:548)
>     at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1449){code}
> This shows that even though the key,value of 
> {code:java}
> spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net {code}
> is being sent via option param, but is not being set internally.
>  
> while this works only if i explicitly set the values in the
> {code:java}
> spark.conf.set(key,value) {code}
> which might be problematic for a multi-tenant solution, which can be using 
> the same spark context.
> one other observation is 
> {code:java}
> df.write.option(key1,value1).option(key2,value2).save(path)  {code}
> fails with same key error while,
> {code:java}
> map = Map(key1->value1, key2->value2)  
> df.write.options(map).save(path) {code}
> works..
>  
> Help required on: Similar to how dataframes `options`
> {code:java}
> df.write.options(Map<key,value>) {code}
>  helps to set the configuration, the *.option(key1, value1)* should also work 
> to write to azure blob storage.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48423) Unable to write MLPipeline to blob storage using .option attribute

Reply via email to