[ 
https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905451#comment-15905451
 ] 

yuhao yang commented on SPARK-19899:
------------------------------------

{quote}
 if we mix-in HasFeaturesCol the featuresCol should be VectorUDT.
{quote}

Guess I misunderstood and thought you want to support vector for FPGrowth. 
Using SparseVector to represent records is not unreasonable for me and 
supporting that is easy and straightforward. But surely we don't need to 
support that until there's a clear requirement. May I know how do you want to 
name the new trait for array<array<_>>, as users will need to invoke 
set<newTrait>Col("...") during fitting. Then we can see if it's more intuitive.

> FPGrowth input column naming
> ----------------------------
>
>                 Key: SPARK-19899
>                 URL: https://issues.apache.org/jira/browse/SPARK-19899
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Maciej Szymkiewicz
>
> Current implementation extends {{HasFeaturesCol}}. Personally I find it 
> rather unfortunate. Up to this moment we used consistent conventions - if we 
> mix-in  {{HasFeaturesCol}} the {{featuresCol}} should be {{VectorUDT}}. 
> Using the same {{Param}} for an {{array<T>}} (and possibly for 
> {{array<arrray<T>>}} once {{PrefixSpan}} is ported to {{ml}}) will be 
> confusing for the users.
> I would like to suggest adding new {{trait}} (let's say 
> {{HasTransactionsCol}}) to clearly indicate that the input type differs for 
> the other {{Estiamtors}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to