[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953201#comment-15953201
 ] 

Cyril de Vogelaere edited comment on SPARK-20180 at 4/3/17 9:57 AM:
--------------------------------------------------------------------

=> Why not let the default be Int.MaxValue?
I'm also ok with a default Int.MaxValue, if the special value zero is really 
something you are against.

if that's what this is about, update the title to reflect it.
=> I will gladly do that, if you think the current title is misleading.

This is a behavior change by default, so we should think carefully about it
=> Yes, I agree.

What are the downsides – why would someone have ever made it 10? presumably, 
performance.
=> The changed code consist simply in an additionnal condition in an if. If you 
want to see a graph, I have one that test the differences in performances, but 
on my implementation optimised for single-item pattern. So it wouldn't be 
relevant, if you are worried of performance drop, I can do additional tests, on 
the two lines I changed. If you want me to use some particular dataset, I will 
also gladly oblige. Just say the word, and you will have them by tommorow.

So it would be less about what impact it has on the performance, since it would 
be negligeable (again, i'm ready to prove that if you want me to), but about 
whether that feature seems needed or not. Which I agree, is debatable.

Also, whichever senior implemented it that way, left this comment : 
// TODO: support unbounded pattern length when maxPatternLength = 0
Which you can find in the original code, and is the reason I created this 
Jira's thread first. Among the list of improvement I want to propose.
You can find that line in the PrefixSpan code if you don't believe me.If theses 
change are rejected, then when I have the occasion, I will remove that line. 
Since this thread would have established that it isn't needed.

You mention tests don't end and haven't established it's not due to your 
change. 
=> I'm establishing that right now ... as I said. Also, they are ending, but 
they are really really slow.

I don't think we can proceed with this in this state, right?
=> I will leave the decision to you





was (Author: syrux):
=> Why not let the default be Int.MaxValue?
I'm also ok with a default Int.MaxValue, if the special value zero is really 
something you are against.

if that's what this is about, update the title to reflect it.
=> I will gladly do that, if you think the current title is misleading.

This is a behavior change by default, so we should think carefully about it
=> Yes, I agree.

What are the downsides – why would someone have ever made it 10? presumably, 
performance.
=> The changed code consist simply in an additionnal condition in an if. If you 
want to see a graph, I have one that test the differences in performances, but 
on my implementation optimised for single-item pattern. So it wouldn't be 
relevant, if you are worried of performance drop, I can do additional tests, on 
the two lines I changed. If you want me to use some particular dataset, I will 
also gladly oblige. Just say the word, and you will have them by tommorow.

So it would be less about what impact it has on the performance, since it would 
be negligeable (again, i'm ready to prove that if you want me to), but about 
whether that feature seems needed or not. Which I agree, is debatable.

Also, whichever senior implemented it that way, left this comment : 
// TODO: support unbounded pattern length when maxPatternLength = 0
Which you can find in the original code, and is the reason I created this 
Jira's thread first. Among the list of improvement I want to propose.
You can find that line in the PrefixSpan code if you don't believe me.If theses 
change are rejected, then when I have the occasion, I will remove that line. So 
it would establish it isn't needed.

You mention tests don't end and haven't established it's not due to your 
change. 
=> I'm establishing that right now ... as I said. Also, they are ending, but 
they are really really slow.

I don't think we can proceed with this in this state, right?
=> I will leave the decision to you




> Add a special value for unlimited max pattern length in Prefix span, and set 
> it as default.
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20180
>                 URL: https://issues.apache.org/jira/browse/SPARK-20180
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.1.0
>            Reporter: Cyril de Vogelaere
>            Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to