[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737802#comment-14737802 ]
Joseph K. Bradley commented on SPARK-6724: ------------------------------------------ Now that the 1.5 release stuff is over, yes! Thanks for your patience. I will assume: * FPGrowth should continue to support arbitrary types (in the spark.mllib API). I.e., we should not change its public interface. * Like other models, FPGrowth should use DataFrame serialization for model save/load. Given these constraints, I think the best way to implement save/load is: * Use DataFrames/Catalyst to test whether the item type is a type recognized by Catalyst (probably using {{ScalaReflection.schemaFor}}). * If the item type is not OK, throw an error. * If the item type is OK, save as a DataFrame. We should definitely support all DataFrame types. There is no need to limit items to primitive Catalyst types. In the future, once UDTs are a public API, we could allow users to make their custom types implement the UDT interface so that we can convert them to Catalyst types. [~MeethuMathew] Can you please update your PR accordingly? I should have time to give feedback or collaborate on the coding. > Model import/export for FPGrowth > -------------------------------- > > Key: SPARK-6724 > URL: https://issues.apache.org/jira/browse/SPARK-6724 > Project: Spark > Issue Type: Sub-task > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Priority: Minor > > Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org