Re: [SPARK-48423] Unable to save ML Pipeline to azure blob storage

2024-06-19 Thread Chhavi Bansal
Hello Team,
I am pinging back on this thread to get a pair of eyes on this issue.
Ticket:  https://issues.apache.org/jira/browse/SPARK-48423

On Thu, 6 Jun 2024 at 00:19, Chhavi Bansal  wrote:

> Hello team,
> I was exploring on how to save ML pipeline to azure blob storage, but was
> setback by an issue where it complains of  `fs.azure.account.key`  not
> being found in the configuration even when I have provided the values in
> the pipelineModel.option(key1,value1) field. I considered raising a
> ticket on spark https://issues.apache.org/jira/browse/SPARK-48423, where
> I describe the entire scenario. I tried debugging the code and found that
> this key is being explicitly asked for in the code. The only solution was
> to again set it part of spark.conf which could result to a race condition
> since we work on multi-tenant architecture.
>
>
>
> Since saving to Azure blob storage would be common, Can someone please
> guide me if I am missing something in the `.option` clause?
>
>
>
> I would be happy to make a contribution to the code if someone can shed
> some light on how this could be solved.
>
> --
> Thanks and Regards,
> Chhavi Bansal
>


-- 
Thanks and Regards,
Chhavi Bansal


Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

2024-06-08 Thread Chhavi Bansal
Hi Someshwar,
Thanks for the response, I have added my comments to the ticket
<https://issues.apache.org/jira/browse/SPARK-48463>.


Thanks,
Chhavi Bansal

On Thu, 6 Jun 2024 at 17:28, Someshwar Kale  wrote:

> As a fix, you may consider adding a transformer to rename columns (perhaps
> replace all columns with dot to underscore) and use the renamed columns in
> your pipeline as below-
>
> val renameColumn = new 
> RenameColumn().setInputCol("location.longitude").setOutputCol("location_longitude")
> val si = new 
> StringIndexer().setInputCol("location_longitude").setOutputCol("longitutdee")
> val pipeline = new Pipeline().setStages(Array(renameColumn, si))
> pipeline.fit(flattenedDf).transform(flattenedDf).show()
>
>
> refer my comment
> <https://issues.apache.org/jira/browse/SPARK-48463?focusedCommentId=17852751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17852751>
>  for
> elaboration.
> Thanks!!
>
> *Regards,*
> *Someshwar Kale*
>
>
>
>
>
> On Thu, Jun 6, 2024 at 3:24 AM Chhavi Bansal 
> wrote:
>
>> Hello team
>> I was exploring feature transformation exposed via Mllib on nested
>> dataset, and encountered an error while applying any transformer to a
>> column with dot notation naming. I thought of raising a ticket on spark
>> https://issues.apache.org/jira/browse/SPARK-48463, where I have
>> mentioned the entire scenario.
>>
>> I wanted to get suggestions on what would be the best way to solve the
>> problem while using the dot notation. One workaround is to use`_` while
>> flattening the dataframe, but that would mean having an additional overhead
>> to convert back to `.` (dot notation ) since that’s the convention for our
>> other flattened data.
>>
>> I would be happy to make a contribution to the code if someone can shed
>> some light on how this could be solved.
>>
>>
>>
>> --
>> Thanks and Regards,
>> Chhavi Bansal
>>
>

-- 
Thanks and Regards,
Chhavi Bansal


[SPARK-48423] Unable to save ML Pipeline to azure blob storage

2024-06-05 Thread Chhavi Bansal
Hello team,
I was exploring on how to save ML pipeline to azure blob storage, but was
setback by an issue where it complains of  `fs.azure.account.key`  not
being found in the configuration even when I have provided the values in
the pipelineModel.option(key1,value1) field. I considered raising a ticket
on spark https://issues.apache.org/jira/browse/SPARK-48423, where I
describe the entire scenario. I tried debugging the code and found that
this key is being explicitly asked for in the code. The only solution was
to again set it part of spark.conf which could result to a race condition
since we work on multi-tenant architecture.



Since saving to Azure blob storage would be common, Can someone please
guide me if I am missing something in the `.option` clause?



I would be happy to make a contribution to the code if someone can shed
some light on how this could be solved.

-- 
Thanks and Regards,
Chhavi Bansal


[SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

2024-06-05 Thread Chhavi Bansal
Hello team
I was exploring feature transformation exposed via Mllib on nested dataset,
and encountered an error while applying any transformer to a column with
dot notation naming. I thought of raising a ticket on spark
https://issues.apache.org/jira/browse/SPARK-48463, where I have mentioned
the entire scenario.

I wanted to get suggestions on what would be the best way to solve the
problem while using the dot notation. One workaround is to use`_` while
flattening the dataframe, but that would mean having an additional overhead
to convert back to `.` (dot notation ) since that’s the convention for our
other flattened data.

I would be happy to make a contribution to the code if someone can shed
some light on how this could be solved.



-- 
Thanks and Regards,
Chhavi Bansal