Re: [SPARK-48423] Unable to save ML Pipeline to azure blob storage
Hello Team, I am pinging back on this thread to get a pair of eyes on this issue. Ticket: https://issues.apache.org/jira/browse/SPARK-48423 On Thu, 6 Jun 2024 at 00:19, Chhavi Bansal wrote: > Hello team, > I was exploring on how to save ML pipeline to azure blob storage, but was > setback by an issue where it complains of `fs.azure.account.key` not > being found in the configuration even when I have provided the values in > the pipelineModel.option(key1,value1) field. I considered raising a > ticket on spark https://issues.apache.org/jira/browse/SPARK-48423, where > I describe the entire scenario. I tried debugging the code and found that > this key is being explicitly asked for in the code. The only solution was > to again set it part of spark.conf which could result to a race condition > since we work on multi-tenant architecture. > > > > Since saving to Azure blob storage would be common, Can someone please > guide me if I am missing something in the `.option` clause? > > > > I would be happy to make a contribution to the code if someone can shed > some light on how this could be solved. > > -- > Thanks and Regards, > Chhavi Bansal > -- Thanks and Regards, Chhavi Bansal
Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)
Hi Someshwar, Thanks for the response, I have added my comments to the ticket <https://issues.apache.org/jira/browse/SPARK-48463>. Thanks, Chhavi Bansal On Thu, 6 Jun 2024 at 17:28, Someshwar Kale wrote: > As a fix, you may consider adding a transformer to rename columns (perhaps > replace all columns with dot to underscore) and use the renamed columns in > your pipeline as below- > > val renameColumn = new > RenameColumn().setInputCol("location.longitude").setOutputCol("location_longitude") > val si = new > StringIndexer().setInputCol("location_longitude").setOutputCol("longitutdee") > val pipeline = new Pipeline().setStages(Array(renameColumn, si)) > pipeline.fit(flattenedDf).transform(flattenedDf).show() > > > refer my comment > <https://issues.apache.org/jira/browse/SPARK-48463?focusedCommentId=17852751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17852751> > for > elaboration. > Thanks!! > > *Regards,* > *Someshwar Kale* > > > > > > On Thu, Jun 6, 2024 at 3:24 AM Chhavi Bansal > wrote: > >> Hello team >> I was exploring feature transformation exposed via Mllib on nested >> dataset, and encountered an error while applying any transformer to a >> column with dot notation naming. I thought of raising a ticket on spark >> https://issues.apache.org/jira/browse/SPARK-48463, where I have >> mentioned the entire scenario. >> >> I wanted to get suggestions on what would be the best way to solve the >> problem while using the dot notation. One workaround is to use`_` while >> flattening the dataframe, but that would mean having an additional overhead >> to convert back to `.` (dot notation ) since that’s the convention for our >> other flattened data. >> >> I would be happy to make a contribution to the code if someone can shed >> some light on how this could be solved. >> >> >> >> -- >> Thanks and Regards, >> Chhavi Bansal >> > -- Thanks and Regards, Chhavi Bansal
[SPARK-48423] Unable to save ML Pipeline to azure blob storage
Hello team, I was exploring on how to save ML pipeline to azure blob storage, but was setback by an issue where it complains of `fs.azure.account.key` not being found in the configuration even when I have provided the values in the pipelineModel.option(key1,value1) field. I considered raising a ticket on spark https://issues.apache.org/jira/browse/SPARK-48423, where I describe the entire scenario. I tried debugging the code and found that this key is being explicitly asked for in the code. The only solution was to again set it part of spark.conf which could result to a race condition since we work on multi-tenant architecture. Since saving to Azure blob storage would be common, Can someone please guide me if I am missing something in the `.option` clause? I would be happy to make a contribution to the code if someone can shed some light on how this could be solved. -- Thanks and Regards, Chhavi Bansal
[SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)
Hello team I was exploring feature transformation exposed via Mllib on nested dataset, and encountered an error while applying any transformer to a column with dot notation naming. I thought of raising a ticket on spark https://issues.apache.org/jira/browse/SPARK-48463, where I have mentioned the entire scenario. I wanted to get suggestions on what would be the best way to solve the problem while using the dot notation. One workaround is to use`_` while flattening the dataframe, but that would mean having an additional overhead to convert back to `.` (dot notation ) since that’s the convention for our other flattened data. I would be happy to make a contribution to the code if someone can shed some light on how this could be solved. -- Thanks and Regards, Chhavi Bansal