[ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051726#comment-16051726 ]
Ilyes Hachani commented on SPARK-12606: --------------------------------------- [~srowen] Can I rewrite it ? What does setting the UID in the constructor means? > Scala/Java compatibility issue Re: how to extend java transformer from Scala > UnaryTransformer ? > ----------------------------------------------------------------------------------------------- > > Key: SPARK-12606 > URL: https://issues.apache.org/jira/browse/SPARK-12606 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 1.5.2 > Environment: Java 8, Mac OS, Spark-1.5.2 > Reporter: Andrew Davidson > Labels: transformers > > Hi Andy, > I suspect that you hit the Scala/Java compatibility issue, I can also > reproduce this issue, so could you file a JIRA to track this issue? > Yanbo > 2016-01-02 3:38 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>: > I am trying to write a trivial transformer I use use in my pipeline. I am > using java and spark 1.5.2. It was suggested that I use the Tokenize.scala > class as an example. This should be very easy how ever I do not understand > Scala, I am having trouble debugging the following exception. > Any help would be greatly appreciated. > Happy New Year > Andy > java.lang.IllegalArgumentException: requirement failed: Param null__inputCol > does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) > at org.apache.spark.ml.param.Params$class.set(params.scala:436) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.param.Params$class.set(params.scala:422) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at > org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) > at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) > public class StemmerTest extends AbstractSparkTest { > @Test > public void test() { > Stemmer stemmer = new Stemmer() > .setInputCol("raw”) //line 30 > .setOutputCol("filtered"); > } > } > /** > * @ see > spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala > * @ see > https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ > * @ see > http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/ > * > * @author andrewdavidson > * > */ > public class Stemmer extends UnaryTransformer<List<String>, List<String>, > Stemmer> implements Serializable{ > static Logger logger = LoggerFactory.getLogger(Stemmer.class); > private static final long serialVersionUID = 1L; > private static final ArrayType inputType = > DataTypes.createArrayType(DataTypes.StringType, true); > private final String uid = Stemmer.class.getSimpleName() + "_" + > UUID.randomUUID().toString(); > @Override > public String uid() { > return uid; > } > /* > override protected def validateInputType(inputType: DataType): Unit = { > require(inputType == StringType, s"Input type must be string type but got > $inputType.") > } > */ > @Override > public void validateInputType(DataType inputTypeArg) { > String msg = "inputType must be " + inputType.simpleString() + " but > got " + inputTypeArg.simpleString(); > assert (inputType.equals(inputTypeArg)) : msg; > } > > @Override > public Function1<List<String>, List<String>> createTransformFunc() { > // > http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters > Function1<List<String>, List<String>> f = new > AbstractFunction1<List<String>, List<String>>() { > public List<String> apply(List<String> words) { > for(String word : words) { > logger.error("AEDWIP input word: {}", word); > } > return words; > } > }; > > return f; > } > @Override > public DataType outputDataType() { > return DataTypes.createArrayType(DataTypes.StringType, true); > } > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org