[ 
https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190904#comment-16190904
 ] 

Akos Tomasits edited comment on SPARK-12606 at 10/4/17 9:02 AM:
----------------------------------------------------------------

We have run into the same issue. We cannot create proper Java transformers 
derived from UnaryTransformer.

We would like to use these custom transformers through CrossValidator, that in 
the end requires a constructor with a string (uid) parameter. I guess the 
custom transformer is supposed to set the provided uid in this constructor, 
however, the object's uid() method is called before the constructor finishes. 
This leads to the above mentioned "null__inputCol" error.

I have created a new JIRA issue for this problem: SPARK-22198


was (Author: akos.tomasits):
We have run into the same issue. We cannot create proper Java transformers 
derived from UnaryTransformer.

We would like to use these custom transformers through CrossValidator, that in 
the end requires a constructor with a string (uid) parameter. I guess the 
custom transformer is supposed to set the provided uid in this constructor, 
however, the object's uid() method is called before the constructor finishes. 
This leads to the above mentioned "null__inputCol" error.

> Scala/Java compatibility issue Re: how to extend java transformer from Scala 
> UnaryTransformer ?
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12606
>                 URL: https://issues.apache.org/jira/browse/SPARK-12606
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.5.2
>         Environment: Java 8, Mac OS, Spark-1.5.2
>            Reporter: Andrew Davidson
>              Labels: transformers
>
> Hi Andy,
> I suspect that you hit the Scala/Java compatibility issue, I can also 
> reproduce this issue, so could you file a JIRA to track this issue?
> Yanbo
> 2016-01-02 3:38 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>:
> I am trying to write a trivial transformer I use use in my pipeline. I am 
> using java and spark 1.5.2. It was suggested that I use the Tokenize.scala 
> class as an example. This should be very easy how ever I do not understand 
> Scala, I am having trouble debugging the following exception.
> Any help would be greatly appreciated.
> Happy New Year
> Andy
> java.lang.IllegalArgumentException: requirement failed: Param null__inputCol 
> does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
>       at scala.Predef$.require(Predef.scala:233)
>       at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
>       at org.apache.spark.ml.param.Params$class.set(params.scala:436)
>       at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>       at org.apache.spark.ml.param.Params$class.set(params.scala:422)
>       at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
>       at 
> org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
>       at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
> public class StemmerTest extends AbstractSparkTest {
>     @Test
>     public void test() {
>         Stemmer stemmer = new Stemmer()
>                                 .setInputCol("raw”) //line 30
>                                 .setOutputCol("filtered");
>     }
> }
> /**
>  * @ see 
> spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>  * @ see 
> https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
>  * @ see 
> http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
>  * 
>  * @author andrewdavidson
>  *
>  */
> public class Stemmer extends UnaryTransformer<List<String>, List<String>, 
> Stemmer> implements Serializable{
>     static Logger logger = LoggerFactory.getLogger(Stemmer.class);
>     private static final long serialVersionUID = 1L;
>     private static final  ArrayType inputType = 
> DataTypes.createArrayType(DataTypes.StringType, true);
>     private final String uid = Stemmer.class.getSimpleName() + "_" + 
> UUID.randomUUID().toString();
>     @Override
>     public String uid() {
>         return uid;
>     }
>     /*
>        override protected def validateInputType(inputType: DataType): Unit = {
>     require(inputType == StringType, s"Input type must be string type but got 
> $inputType.")
>   }
>      */
>     @Override
>     public void validateInputType(DataType inputTypeArg) {
>         String msg = "inputType must be " + inputType.simpleString() + " but 
> got " + inputTypeArg.simpleString();
>         assert (inputType.equals(inputTypeArg)) : msg; 
>     }
>     
>     @Override
>     public Function1<List<String>, List<String>> createTransformFunc() {
>         // 
> http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters
>         Function1<List<String>, List<String>> f = new 
> AbstractFunction1<List<String>, List<String>>() {
>             public List<String> apply(List<String> words) {
>                 for(String word : words) {
>                     logger.error("AEDWIP input word: {}", word);
>                 }
>                 return words;
>             }
>         };
>         
>         return f;
>     }
>     @Override
>     public DataType outputDataType() {
>         return DataTypes.createArrayType(DataTypes.StringType, true);
>     }
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to