Re: should I file a bug? Re: trouble implementing Transformer and calling DataFrame.withColumn()
ing df.withColumn()"); > > transformerdDF.printSchema(); > > logger.info("show() after calling df.withColumn()"); > > transformerdDF.show(); > > > logger.info("END"); > > } > > > DataFrame createData() { > > Features f1 = new Features(1, category1); > > Features f2 = new Features(2, category2); > > ArrayList data = new ArrayList(2); > > data.add(f1); > > data.add(f2); > > //JavaRDD rdd = > javaSparkContext.parallelize(Arrays.asList(f1, f2)); // does not work > > JavaRDD rdd = javaSparkContext.parallelize(data); > > DataFrame df = sqlContext.createDataFrame(rdd, Features.class); > > return df; > > } > > > class MyUDF implements UDF1<String, String> { > > @Override > > public String call(String s) throws Exception { > > logger.info("AEDWIP s:{}", s); > > String ret = s.equalsIgnoreCase(category1) ? category1 : > category3; > > return ret; > > } > > } > > > public class Features implements Serializable{ > > private static final long serialVersionUID = 1L; > > int id; > > String labelStr; > > > Features(int id, String l) { > > this.id = id; > > this.labelStr = l; > > } > > > public int getId() { > > return id; > > } > > > public void setId(int id) { > > this.id = id; > > } > > > public String getLabelStr() { > > return labelStr; > > } > > > public void setLabelStr(String labelStr) { > > this.labelStr = labelStr; > > } > > } > > > > From: Andrew Davidson <a...@santacruzintegration.com> > Date: Monday, December 21, 2015 at 7:47 PM > To: Jeff Zhang <zjf...@gmail.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: trouble implementing Transformer and calling > DataFrame.withColumn() > > Hi Jeff > > I took a look at Tokenizer.cal, UnaryTransformer.scala, and > Transformer.scala. How ever I can not figure out how implement > createTransformFunc() > in Java 8. > > It would be nice to be able to use this transformer in my pipe line but > not required. The real problem is I can not figure out how to create a > Column I can pass to dataFrame.withColumn() in my Java code. Here is my > original python > > binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else > “signal", StringType()) > ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"])) > > > Any suggestions would be greatly appreciated. > > Andy > > public class LabelToBinaryTransformer > > extends UnaryTransformer<String, String, > LabelToBinaryTransformer> { > > private static final long serialVersionUID = 4202800448830968904L; > > private final UUID uid = UUID.randomUUID(); > > > @Override > > public String uid() { > > return uid.toString(); > > } > > > @Override > > public Function1<String, String> createTransformFunc() { > > // original python code > > // binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else > “signal", StringType()) > > Function1 interface is not easy to implement lots of functions > > ??? > > } > > > @Override > > public DataType outputDataType() { > > StringType ret = new StringType(); > > return ret; > > } > > > > } > > > From: Jeff Zhang <zjf...@gmail.com> > Date: Monday, December 21, 2015 at 6:43 PM > To: Andrew Davidson <a...@santacruzintegration.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: trouble implementing Transformer and calling > DataFrame.withColumn() > > In your case, I would suggest you to extends UnaryTransformer which is > much easier. > > Yeah, I have to admit that there's no document about how to write a custom > Transformer, I think we need to add that, since writing custom Transformer > is a very typical work in machine learning. > > On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> >> I am trying to port the following python function to Java 8. I would like >> my java implementation to implement Transform
Re: trouble implementing Transformer and calling DataFrame.withColumn()
In your case, I would suggest you to extends UnaryTransformer which is much easier. Yeah, I have to admit that there's no document about how to write a custom Transformer, I think we need to add that, since writing custom Transformer is a very typical work in machine learning. On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > > I am trying to port the following python function to Java 8. I would like > my java implementation to implement Transformer so I can use it in a > pipeline. > > I am having a heck of a time trying to figure out how to create a Column > variable I can pass to DataFrame.withColumn(). As far as I know > withColumn() the only way to append a column to a data frame. > > Any comments or suggestions would be greatly appreciated > > Andy > > > def convertMultinomialLabelToBinary(dataFrame): > newColName = "binomialLabel" > binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else > “signal", StringType()) > ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"])) > return ret > trainingDF2 = convertMultinomialLabelToBinary(trainingDF1) > > > > public class LabelToBinaryTransformer extends Transformer { > > private static final long serialVersionUID = 4202800448830968904L; > > private final UUID uid = UUID.randomUUID(); > > public String inputCol; > > public String outputCol; > > > > @Override > > public String uid() { > > return uid.toString(); > > } > > > @Override > > public Transformer copy(ParamMap pm) { > > Params xx = defaultCopy(pm); > > return ???; > > } > > > @Override > > public DataFrame transform(DataFrame df) { > > MyUDF myUDF = new MyUDF(myUDF, null, null); > > Column c = df.col(inputCol); > > ??? UDF apply does not take a col > > Column col = myUDF.apply(df.col(inputCol)); > > DataFrame ret = df.withColumn(outputCol, col); > > return ret; > > } > > > @Override > > public StructType transformSchema(StructType arg0) { > >*??? What is this function supposed to do???* > > ???Is this the type of the new output column > > } > > > > class MyUDF extends UserDefinedFunction { > > public MyUDF(Object f, DataType dataType, Seq inputTypes) > { > > super(f, dataType, inputTypes); > > ??? Why do I have to implement this constructor ??? > > ??? What are the arguments ??? > > } > > > > @Override > > public > > Column apply(scala.collection.Seq exprs) { > > What do you do with a scala seq? > > return ???; > > } > > } > > } > > > -- Best Regards Jeff Zhang
trouble implementing Transformer and calling DataFrame.withColumn()
I am trying to port the following python function to Java 8. I would like my java implementation to implement Transformer so I can use it in a pipeline. I am having a heck of a time trying to figure out how to create a Column variable I can pass to DataFrame.withColumn(). As far as I know withColumn() the only way to append a column to a data frame. Any comments or suggestions would be greatly appreciated Andy def convertMultinomialLabelToBinary(dataFrame): newColName = "binomialLabel" binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else ³signal", StringType()) ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"])) return ret trainingDF2 = convertMultinomialLabelToBinary(trainingDF1) public class LabelToBinaryTransformer extends Transformer { private static final long serialVersionUID = 4202800448830968904L; private final UUID uid = UUID.randomUUID(); public String inputCol; public String outputCol; @Override public String uid() { return uid.toString(); } @Override public Transformer copy(ParamMap pm) { Params xx = defaultCopy(pm); return ???; } @Override public DataFrame transform(DataFrame df) { MyUDF myUDF = new MyUDF(myUDF, null, null); Column c = df.col(inputCol); ??? UDF apply does not take a col Column col = myUDF.apply(df.col(inputCol)); DataFrame ret = df.withColumn(outputCol, col); return ret; } @Override public StructType transformSchema(StructType arg0) { ??? What is this function supposed to do??? ???Is this the type of the new output column } class MyUDF extends UserDefinedFunction { public MyUDF(Object f, DataType dataType, Seq inputTypes) { super(f, dataType, inputTypes); ??? Why do I have to implement this constructor ??? ??? What are the arguments ??? } @Override public Column apply(scala.collection.Seq exprs) { What do you do with a scala seq? return ???; } } }