Re: Spark transformations

Thunder Stumpges Mon, 12 Sep 2016 10:23:11 -0700

Yep, totally with you on this. None of it is ideal but doesn't sound like
there will be any changes coming to the visibility of ml supporting
classes.
-Thunder
On Mon, Sep 12, 2016 at 10:10 AM janardhan shetty <janardhan...@gmail.com>
wrote:


> Thanks Thunder. To copy the code base is difficult since we need to copy
> in entirety or transitive dependency files as well.
> If we need to do complex operations of taking a column as a whole instead
> of each element in a row is not possible as of now.
>
> Trying to find few pointers to easily solve this.
>
> On Mon, Sep 12, 2016 at 9:43 AM, Thunder Stumpges <
> thunder.stump...@gmail.com> wrote:
>
>> Hi Janardhan,
>>
>> I have run into similar issues and asked similar questions. I also ran
>> into many problems with private code when trying to write my own
>> Model/Transformer/Estimator. (you might be able to find my question to the
>> group regarding this, I can't really tell if my emails are getting through,
>> as I don't get any responses). For now I have resorted to copying out the
>> code that I need from the spark codebase and into mine. I'm certain this is
>> not the best, but it has to be better than "implementing it myself" which
>> was what the only response to my question said to do.
>>
>> As for the transforms, I also asked a similar question. The only way I've
>> seen it done in code is using a UDF. As you mention, the UDF can only
>> access fields on a "row by row" basis. I have not gotten any replies at all
>> on my question, but I also need to do some more complicated operation in my
>> work (join to another model RDD, flat-map, calculate, reduce) in order to
>> get the value for the new column. So far no great solution.
>>
>> Sorry I don't have any answers, but wanted to chime in that I am also a
>> bit stuck on similar issues. Hope we can find a workable solution soon.
>> Cheers,
>> Thunder
>>
>>
>>
>> On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty <janardhan...@gmail.com>
>> wrote:
>>
>>> Noticed few things about Spark transformers just wanted to be clear.
>>>
>>> Unary transformer:
>>>
>>> createTransformFunc: IN => OUT  = { *item* => }
>>> Here *item *is single element and *NOT* entire column.
>>>
>>> I would like to get the number of elements in that particular column.
>>> Since there is *no forward checking* how can we get this information ?
>>> We have visibility into single element and not the entire column.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty <janardhan...@gmail.com
>>> > wrote:
>>>
>>>> In scala Spark ML Dataframes.
>>>>
>>>> On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar <
>>>> somasundar.se...@tigeranalytics.com> wrote:
>>>>
>>>>> Can you try this
>>>>>
>>>>>
>>>>> https://www.linkedin.com/pulse/hive-functions-udfudaf-udtf-examples-gaurav-singh
>>>>>
>>>>> On 4 Sep 2016 9:38 pm, "janardhan shetty" <janardhan...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there any chance that we can send entire multiple columns to an
>>>>>> udf and generate a new column for Spark ML.
>>>>>> I see similar approach as VectorAssembler but not able to use few
>>>>>> classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable 
>>>>>> since
>>>>>> they are private.
>>>>>>
>>>>>> Any leads/examples is appreciated in this regard..
>>>>>>
>>>>>> Requirement:
>>>>>> *Input*: Multiple columns of a Dataframe
>>>>>> *Output*:  Single new modified column
>>>>>>
>>>>>
>>>>
>>>
>

Re: Spark transformations

Reply via email to