Yep, totally with you on this. None of it is ideal but doesn't sound like there will be any changes coming to the visibility of ml supporting classes. -Thunder On Mon, Sep 12, 2016 at 10:10 AM janardhan shetty <janardhan...@gmail.com> wrote:
> Thanks Thunder. To copy the code base is difficult since we need to copy > in entirety or transitive dependency files as well. > If we need to do complex operations of taking a column as a whole instead > of each element in a row is not possible as of now. > > Trying to find few pointers to easily solve this. > > On Mon, Sep 12, 2016 at 9:43 AM, Thunder Stumpges < > thunder.stump...@gmail.com> wrote: > >> Hi Janardhan, >> >> I have run into similar issues and asked similar questions. I also ran >> into many problems with private code when trying to write my own >> Model/Transformer/Estimator. (you might be able to find my question to the >> group regarding this, I can't really tell if my emails are getting through, >> as I don't get any responses). For now I have resorted to copying out the >> code that I need from the spark codebase and into mine. I'm certain this is >> not the best, but it has to be better than "implementing it myself" which >> was what the only response to my question said to do. >> >> As for the transforms, I also asked a similar question. The only way I've >> seen it done in code is using a UDF. As you mention, the UDF can only >> access fields on a "row by row" basis. I have not gotten any replies at all >> on my question, but I also need to do some more complicated operation in my >> work (join to another model RDD, flat-map, calculate, reduce) in order to >> get the value for the new column. So far no great solution. >> >> Sorry I don't have any answers, but wanted to chime in that I am also a >> bit stuck on similar issues. Hope we can find a workable solution soon. >> Cheers, >> Thunder >> >> >> >> On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty <janardhan...@gmail.com> >> wrote: >> >>> Noticed few things about Spark transformers just wanted to be clear. >>> >>> Unary transformer: >>> >>> createTransformFunc: IN => OUT = { *item* => } >>> Here *item *is single element and *NOT* entire column. >>> >>> I would like to get the number of elements in that particular column. >>> Since there is *no forward checking* how can we get this information ? >>> We have visibility into single element and not the entire column. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty <janardhan...@gmail.com >>> > wrote: >>> >>>> In scala Spark ML Dataframes. >>>> >>>> On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar < >>>> somasundar.se...@tigeranalytics.com> wrote: >>>> >>>>> Can you try this >>>>> >>>>> >>>>> https://www.linkedin.com/pulse/hive-functions-udfudaf-udtf-examples-gaurav-singh >>>>> >>>>> On 4 Sep 2016 9:38 pm, "janardhan shetty" <janardhan...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Is there any chance that we can send entire multiple columns to an >>>>>> udf and generate a new column for Spark ML. >>>>>> I see similar approach as VectorAssembler but not able to use few >>>>>> classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable >>>>>> since >>>>>> they are private. >>>>>> >>>>>> Any leads/examples is appreciated in this regard.. >>>>>> >>>>>> Requirement: >>>>>> *Input*: Multiple columns of a Dataframe >>>>>> *Output*: Single new modified column >>>>>> >>>>> >>>> >>> >