Re: should I file a bug? Re: trouble implementing Transformer and calling DataFrame.withColumn()

2015-12-22 Thread Jeff Zhang
ing df.withColumn()");
>
> transformerdDF.printSchema();
>
> logger.info("show() after calling df.withColumn()");
>
> transformerdDF.show();
>
>
> logger.info("END");
>
> }
>
>
> DataFrame createData() {
>
> Features f1 = new Features(1, category1);
>
> Features f2 = new Features(2, category2);
>
> ArrayList data = new ArrayList(2);
>
> data.add(f1);
>
> data.add(f2);
>
> //JavaRDD rdd =
> javaSparkContext.parallelize(Arrays.asList(f1, f2)); // does not work
>
> JavaRDD rdd = javaSparkContext.parallelize(data);
>
> DataFrame df = sqlContext.createDataFrame(rdd, Features.class);
>
> return df;
>
> }
>
>
> class MyUDF implements UDF1<String, String> {
>
> @Override
>
> public String call(String s) throws Exception {
>
> logger.info("AEDWIP s:{}", s);
>
> String ret = s.equalsIgnoreCase(category1) ?  category1 :
> category3;
>
> return ret;
>
> }
>
> }
>
>
> public class Features implements Serializable{
>
>     private static final long serialVersionUID = 1L;
>
> int id;
>
> String labelStr;
>
>
> Features(int id, String l) {
>
> this.id = id;
>
> this.labelStr = l;
>
> }
>
>
> public int getId() {
>
> return id;
>
> }
>
>
> public void setId(int id) {
>
> this.id = id;
>
> }
>
>
> public String getLabelStr() {
>
> return labelStr;
>
> }
>
>
> public void setLabelStr(String labelStr) {
>
> this.labelStr = labelStr;
>
> }
>
> }
>
>
>
> From: Andrew Davidson <a...@santacruzintegration.com>
> Date: Monday, December 21, 2015 at 7:47 PM
> To: Jeff Zhang <zjf...@gmail.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: trouble implementing Transformer and calling
> DataFrame.withColumn()
>
> Hi Jeff
>
> I took a look at Tokenizer.cal, UnaryTransformer.scala, and
> Transformer.scala.  How ever I can not figure out how implement 
> createTransformFunc()
> in Java 8.
>
> It would be nice to be able to use this transformer in my pipe line but
> not required. The real problem is I can not figure out how to create a
> Column I can pass to dataFrame.withColumn() in my Java code. Here is my
> original python
>
> binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else 
> “signal", StringType())
> ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
>
>
> Any suggestions would be greatly appreciated.
>
> Andy
>
> public class LabelToBinaryTransformer
>
> extends UnaryTransformer<String, String,
> LabelToBinaryTransformer> {
>
> private static final long serialVersionUID = 4202800448830968904L;
>
> private  final UUID uid = UUID.randomUUID();
>
>
> @Override
>
> public String uid() {
>
> return uid.toString();
>
> }
>
>
> @Override
>
> public Function1<String, String> createTransformFunc() {
>
> // original python code
>
> // binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
> “signal", StringType())
>
> Function1 interface is not easy to implement lots of functions
>
> ???
>
> }
>
>
> @Override
>
> public DataType outputDataType() {
>
> StringType ret = new StringType();
>
> return ret;
>
> }
>
>
>
> }
>
>
> From: Jeff Zhang <zjf...@gmail.com>
> Date: Monday, December 21, 2015 at 6:43 PM
> To: Andrew Davidson <a...@santacruzintegration.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: trouble implementing Transformer and calling
> DataFrame.withColumn()
>
> In your case, I would suggest you to extends UnaryTransformer which is
> much easier.
>
> Yeah, I have to admit that there's no document about how to write a custom
> Transformer, I think we need to add that, since writing custom Transformer
> is a very typical work in machine learning.
>
> On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>>
>> I am trying to port the following python function to Java 8. I would like
>> my java implementation to implement Transform

Re: trouble implementing Transformer and calling DataFrame.withColumn()

2015-12-21 Thread Jeff Zhang
In your case, I would suggest you to extends UnaryTransformer which is much
easier.

Yeah, I have to admit that there's no document about how to write a custom
Transformer, I think we need to add that, since writing custom Transformer
is a very typical work in machine learning.

On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:

>
> I am trying to port the following python function to Java 8. I would like
> my java implementation to implement Transformer so I can use it in a
> pipeline.
>
> I am having a heck of a time trying to figure out how to create a Column
> variable I can pass to DataFrame.withColumn(). As far as I know
> withColumn() the only way to append a column to a data frame.
>
> Any comments or suggestions would be greatly appreciated
>
> Andy
>
>
> def convertMultinomialLabelToBinary(dataFrame):
> newColName = "binomialLabel"
> binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else 
> “signal", StringType())
> ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
> return ret
> trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)
>
>
>
> public class LabelToBinaryTransformer extends Transformer {
>
> private static final long serialVersionUID = 4202800448830968904L;
>
> private  final UUID uid = UUID.randomUUID();
>
> public String inputCol;
>
> public String outputCol;
>
>
>
> @Override
>
> public String uid() {
>
> return uid.toString();
>
> }
>
>
> @Override
>
> public Transformer copy(ParamMap pm) {
>
> Params xx = defaultCopy(pm);
>
> return ???;
>
> }
>
>
> @Override
>
> public DataFrame transform(DataFrame df) {
>
> MyUDF myUDF = new MyUDF(myUDF, null, null);
>
> Column c = df.col(inputCol);
>
> ??? UDF apply does not take a col
>
> Column col = myUDF.apply(df.col(inputCol));
>
> DataFrame ret = df.withColumn(outputCol, col);
>
> return ret;
>
> }
>
>
> @Override
>
> public StructType transformSchema(StructType arg0) {
>
>*??? What is this function supposed to do???*
>
>   ???Is this the type of the new output column
>
> }
>
>
>
> class MyUDF extends UserDefinedFunction {
>
> public MyUDF(Object f, DataType dataType, Seq inputTypes)
> {
>
> super(f, dataType, inputTypes);
>
> ??? Why do I have to implement this constructor ???
>
> ??? What are the arguments ???
>
> }
>
>
>
> @Override
>
> public
>
> Column apply(scala.collection.Seq exprs) {
>
> What do you do with a scala seq?
>
> return ???;
>
> }
>
> }
>
> }
>
>
>


-- 
Best Regards

Jeff Zhang


trouble implementing Transformer and calling DataFrame.withColumn()

2015-12-21 Thread Andy Davidson

I am trying to port the following python function to Java 8. I would like my
java implementation to implement Transformer so I can use it in a pipeline.

I am having a heck of a time trying to figure out how to create a Column
variable I can pass to DataFrame.withColumn(). As far as I know withColumn()
the only way to append a column to a data frame.

Any comments or suggestions would be greatly appreciated

Andy


def convertMultinomialLabelToBinary(dataFrame):
newColName = "binomialLabel"
binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
³signal", StringType())
ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
return ret

trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)


public class LabelToBinaryTransformer extends Transformer {

private static final long serialVersionUID = 4202800448830968904L;

private  final UUID uid = UUID.randomUUID();

public String inputCol;

public String outputCol;



@Override

public String uid() {

return uid.toString();

}



@Override

public Transformer copy(ParamMap pm) {

Params xx = defaultCopy(pm);

return ???;

}



@Override

public DataFrame transform(DataFrame df) {

MyUDF myUDF = new MyUDF(myUDF, null, null);

Column c = df.col(inputCol);

??? UDF apply does not take a col

Column col = myUDF.apply(df.col(inputCol));

DataFrame ret = df.withColumn(outputCol, col);

return ret;

}



@Override

public StructType transformSchema(StructType arg0) {

   ??? What is this function supposed to do???

  ???Is this the type of the new output column

}



class MyUDF extends UserDefinedFunction {

public MyUDF(Object f, DataType dataType, Seq inputTypes)
{

super(f, dataType, inputTypes);

??? Why do I have to implement this constructor ???

??? What are the arguments ???

}



@Override

public

Column apply(scala.collection.Seq exprs) {

What do you do with a scala seq?

return ???;

}

}

}