[jira] [Commented] (SPARK-19899) FPGrowth input column naming

Maciej Szymkiewicz (JIRA) Fri, 10 Mar 2017 09:54:23 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905480#comment-15905480
 ]


Maciej Szymkiewicz commented on SPARK-19899:
--------------------------------------------

This is just an idea, but I would start with:

- {{featuresCol}} - {{VectorUDT}}
- {{transactionsCol}} - {{Array<\_>}} - for frequent (unordered) pattern mining.
- {{sequencesCol}}  -  {{Array<Array<\_>>}} - for sequential pattern ming.



> FPGrowth input column naming
> ----------------------------
>
>                 Key: SPARK-19899
>                 URL: https://issues.apache.org/jira/browse/SPARK-19899
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Maciej Szymkiewicz
>
> Current implementation extends {{HasFeaturesCol}}. Personally I find it 
> rather unfortunate. Up to this moment we used consistent conventions - if we 
> mix-in  {{HasFeaturesCol}} the {{featuresCol}} should be {{VectorUDT}}. 
> Using the same {{Param}} for an {{array<T>}} (and possibly for 
> {{array<arrray<T>>}} once {{PrefixSpan}} is ported to {{ml}}) will be 
> confusing for the users.
> I would like to suggest adding new {{trait}} (let's say 
> {{HasTransactionsCol}}) to clearly indicate that the input type differs for 
> the other {{Estiamtors}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19899) FPGrowth input column naming

Reply via email to