[jira] [Commented] (SPARK-17048) ML model read for custom transformers in a pipeline does not work
[ https://issues.apache.org/jira/browse/SPARK-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592141#comment-15592141 ] Nicolas Long commented on SPARK-17048: -- I hit this today too. The Scala workaround is simply to create an object of the same name that extends DefaultParamsReadable. E.g. {code:java} class HtmlRemover(val uid: String) extends StringUnaryTransformer[String, HtmlRemover] with DefaultParamsWritable { def this() = this(Identifiable.randomUID("htmlremover")) def createTransformFunc: String => String = s => { Jsoup.parse(s).body().text() } } object HtmlRemover extends DefaultParamsReadable[HtmlRemover] {code} Note that StringUnaryTransformer is a simple custom wrapper trait here. > ML model read for custom transformers in a pipeline does not work > -- > > Key: SPARK-17048 > URL: https://issues.apache.org/jira/browse/SPARK-17048 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 > Environment: Spark 2.0.0 > Java API >Reporter: Taras Matyashovskyy > Labels: easyfix, features > Original Estimate: 2h > Remaining Estimate: 2h > > 0. Use Java API :( > 1. Create any custom ML transformer > 2. Make it MLReadable and MLWritable > 3. Add to pipeline > 4. Evaluate model, e.g. CrossValidationModel, and save results to disk > 5. For custom transformer you can use DefaultParamsReader and > DefaultParamsWriter, for instance > 6. Load model from saved directory > 7. All out-of-the-box objects are loaded successfully, e.g. Pipeline, > Evaluator, etc. > 8. Your custom transformer will fail with NPE > Reason: > ReadWrite.scala:447 > cls.getMethod("read").invoke(null).asInstanceOf[MLReader[T]].load(path) > In Java this only works for static methods. > As we are implementing MLReadable or MLWritable, then this call should be > instance method call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17048) ML model read for custom transformers in a pipeline does not work
[ https://issues.apache.org/jira/browse/SPARK-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473009#comment-15473009 ] Yicheng Luo commented on SPARK-17048: - The reason this is failing is because in Java the MLReadable trait will become a become an interface. the read method is implemented as an instance method, which would require an object of the instance in order to call the read correctly. Hence, in ReadWrite.scala:447 the invoke method is supplied an null pointer which would mean that this method is a static method. Yet it is an instance method thus causing the failure. > ML model read for custom transformers in a pipeline does not work > -- > > Key: SPARK-17048 > URL: https://issues.apache.org/jira/browse/SPARK-17048 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 > Environment: Spark 2.0.0 > Java API >Reporter: Taras Matyashovskyy > Labels: easyfix, features > Original Estimate: 2h > Remaining Estimate: 2h > > 0. Use Java API :( > 1. Create any custom ML transformer > 2. Make it MLReadable and MLWritable > 3. Add to pipeline > 4. Evaluate model, e.g. CrossValidationModel, and save results to disk > 5. For custom transformer you can use DefaultParamsReader and > DefaultParamsWriter, for instance > 6. Load model from saved directory > 7. All out-of-the-box objects are loaded successfully, e.g. Pipeline, > Evaluator, etc. > 8. Your custom transformer will fail with NPE > Reason: > ReadWrite.scala:447 > cls.getMethod("read").invoke(null).asInstanceOf[MLReader[T]].load(path) > In Java this only works for static methods. > As we are implementing MLReadable or MLWritable, then this call should be > instance method call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17048) ML model read for custom transformers in a pipeline does not work
[ https://issues.apache.org/jira/browse/SPARK-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426358#comment-15426358 ] Taras Matyashovskyy commented on SPARK-17048: - All my examples are located under https://github.com/tmatyashovsky/spark-ml-samples and are quite self explanatory. Issue I am talking about is with any of the custom transformers located here: https://github.com/tmatyashovsky/spark-ml-samples/tree/master/spark-driver/src/main/java/com/lohika/morning/ml/spark/driver/service/lyrics In order to overcome it, I added explicitly static method public static MLReader read() without implementing of any interface, e.g. MLReadable, DefaultParamsReadable, etc. and that obviously worked. But that is just a workaround so it would be great to have it fixed in ReadWrite.scala:447 too. Please let me know in case of further questions. > ML model read for custom transformers in a pipeline does not work > -- > > Key: SPARK-17048 > URL: https://issues.apache.org/jira/browse/SPARK-17048 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 > Environment: Spark 2.0.0 > Java API >Reporter: Taras Matyashovskyy > Labels: easyfix, features > Original Estimate: 2h > Remaining Estimate: 2h > > 0. Use Java API :( > 1. Create any custom ML transformer > 2. Make it MLReadable and MLWritable > 3. Add to pipeline > 4. Evaluate model, e.g. CrossValidationModel, and save results to disk > 5. For custom transformer you can use DefaultParamsReader and > DefaultParamsWriter, for instance > 6. Load model from saved directory > 7. All out-of-the-box objects are loaded successfully, e.g. Pipeline, > Evaluator, etc. > 8. Your custom transformer will fail with NPE > Reason: > ReadWrite.scala:447 > cls.getMethod("read").invoke(null).asInstanceOf[MLReader[T]].load(path) > In Java this only works for static methods. > As we are implementing MLReadable or MLWritable, then this call should be > instance method call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17048) ML model read for custom transformers in a pipeline does not work
[ https://issues.apache.org/jira/browse/SPARK-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422365#comment-15422365 ] Yanbo Liang commented on SPARK-17048: - [~taras.matyashov...@gmail.com] Would you mind to share your code or provide a simple example to make others can help you diagnose this issue? Thanks! > ML model read for custom transformers in a pipeline does not work > -- > > Key: SPARK-17048 > URL: https://issues.apache.org/jira/browse/SPARK-17048 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 > Environment: Spark 2.0.0 > Java API >Reporter: Taras Matyashovskyy > Labels: easyfix, features > Original Estimate: 2h > Remaining Estimate: 2h > > 0. Use Java API :( > 1. Create any custom ML transformer > 2. Make it MLReadable and MLWritable > 3. Add to pipeline > 4. Evaluate model, e.g. CrossValidationModel, and save results to disk > 5. For custom transformer you can use DefaultParamsReader and > DefaultParamsWriter, for instance > 6. Load model from saved directory > 7. All out-of-the-box objects are loaded successfully, e.g. Pipeline, > Evaluator, etc. > 8. Your custom transformer will fail with NPE > Reason: > ReadWrite.scala:447 > cls.getMethod("read").invoke(null).asInstanceOf[MLReader[T]].load(path) > In Java this only works for static methods. > As we are implementing MLReadable or MLWritable, then this call should be > instance method call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org