Hi Janardhan,

I have run into similar issues and asked similar questions. I also ran into
many problems with private code when trying to write my own
Model/Transformer/Estimator. (you might be able to find my question to the
group regarding this, I can't really tell if my emails are getting through,
as I don't get any responses). For now I have resorted to copying out the
code that I need from the spark codebase and into mine. I'm certain this is
not the best, but it has to be better than "implementing it myself" which
was what the only response to my question said to do.

As for the transforms, I also asked a similar question. The only way I've
seen it done in code is using a UDF. As you mention, the UDF can only
access fields on a "row by row" basis. I have not gotten any replies at all
on my question, but I also need to do some more complicated operation in my
work (join to another model RDD, flat-map, calculate, reduce) in order to
get the value for the new column. So far no great solution.

Sorry I don't have any answers, but wanted to chime in that I am also a bit
stuck on similar issues. Hope we can find a workable solution soon.
Cheers,
Thunder



On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty <janardhan...@gmail.com>
wrote:

> Noticed few things about Spark transformers just wanted to be clear.
>
> Unary transformer:
>
> createTransformFunc: IN => OUT  = { *item* => }
> Here *item *is single element and *NOT* entire column.
>
> I would like to get the number of elements in that particular column.
> Since there is *no forward checking* how can we get this information ?
> We have visibility into single element and not the entire column.
>
>
>
>
>
>
>
>
>
>
> On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty <janardhan...@gmail.com>
> wrote:
>
>> In scala Spark ML Dataframes.
>>
>> On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar <
>> somasundar.se...@tigeranalytics.com> wrote:
>>
>>> Can you try this
>>>
>>>
>>> https://www.linkedin.com/pulse/hive-functions-udfudaf-udtf-examples-gaurav-singh
>>>
>>> On 4 Sep 2016 9:38 pm, "janardhan shetty" <janardhan...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any chance that we can send entire multiple columns to an udf
>>>> and generate a new column for Spark ML.
>>>> I see similar approach as VectorAssembler but not able to use few
>>>> classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable since
>>>> they are private.
>>>>
>>>> Any leads/examples is appreciated in this regard..
>>>>
>>>> Requirement:
>>>> *Input*: Multiple columns of a Dataframe
>>>> *Output*:  Single new modified column
>>>>
>>>
>>
>

Reply via email to