[jira] [Comment Edited] (SPARK-20180) Add a special value for unlimited max pattern length in Prefix span, and set it as default.
[ https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953201#comment-15953201 ] Cyril de Vogelaere edited comment on SPARK-20180 at 4/3/17 9:57 AM: => Why not let the default be Int.MaxValue? I'm also ok with a default Int.MaxValue, if the special value zero is really something you are against. if that's what this is about, update the title to reflect it. => I will gladly do that, if you think the current title is misleading. This is a behavior change by default, so we should think carefully about it => Yes, I agree. What are the downsides – why would someone have ever made it 10? presumably, performance. => The changed code consist simply in an additionnal condition in an if. If you want to see a graph, I have one that test the differences in performances, but on my implementation optimised for single-item pattern. So it wouldn't be relevant, if you are worried of performance drop, I can do additional tests, on the two lines I changed. If you want me to use some particular dataset, I will also gladly oblige. Just say the word, and you will have them by tommorow. So it would be less about what impact it has on the performance, since it would be negligeable (again, i'm ready to prove that if you want me to), but about whether that feature seems needed or not. Which I agree, is debatable. Also, whichever senior implemented it that way, left this comment : // TODO: support unbounded pattern length when maxPatternLength = 0 Which you can find in the original code, and is the reason I created this Jira's thread first. Among the list of improvement I want to propose. You can find that line in the PrefixSpan code if you don't believe me.If theses change are rejected, then when I have the occasion, I will remove that line. Since this thread would have established that it isn't needed. You mention tests don't end and haven't established it's not due to your change. => I'm establishing that right now ... as I said. Also, they are ending, but they are really really slow. I don't think we can proceed with this in this state, right? => I will leave the decision to you was (Author: syrux): => Why not let the default be Int.MaxValue? I'm also ok with a default Int.MaxValue, if the special value zero is really something you are against. if that's what this is about, update the title to reflect it. => I will gladly do that, if you think the current title is misleading. This is a behavior change by default, so we should think carefully about it => Yes, I agree. What are the downsides – why would someone have ever made it 10? presumably, performance. => The changed code consist simply in an additionnal condition in an if. If you want to see a graph, I have one that test the differences in performances, but on my implementation optimised for single-item pattern. So it wouldn't be relevant, if you are worried of performance drop, I can do additional tests, on the two lines I changed. If you want me to use some particular dataset, I will also gladly oblige. Just say the word, and you will have them by tommorow. So it would be less about what impact it has on the performance, since it would be negligeable (again, i'm ready to prove that if you want me to), but about whether that feature seems needed or not. Which I agree, is debatable. Also, whichever senior implemented it that way, left this comment : // TODO: support unbounded pattern length when maxPatternLength = 0 Which you can find in the original code, and is the reason I created this Jira's thread first. Among the list of improvement I want to propose. You can find that line in the PrefixSpan code if you don't believe me.If theses change are rejected, then when I have the occasion, I will remove that line. So it would establish it isn't needed. You mention tests don't end and haven't established it's not due to your change. => I'm establishing that right now ... as I said. Also, they are ending, but they are really really slow. I don't think we can proceed with this in this state, right? => I will leave the decision to you > Add a special value for unlimited max pattern length in Prefix span, and set > it as default. > --- > > Key: SPARK-20180 > URL: https://issues.apache.org/jira/browse/SPARK-20180 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Minor > Original Estimate: 0h > Remaining Estimate: 0h > > Right now, we need to use .setMaxPatternLength() method to > specify is the maximum pattern length of a sequence. Any pattern longer than > that won't be outputted. > The current default maxPatt
[jira] [Comment Edited] (SPARK-20180) Add a special value for unlimited max pattern length in Prefix span, and set it as default.
[ https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953201#comment-15953201 ] Cyril de Vogelaere edited comment on SPARK-20180 at 4/3/17 9:45 AM: => Why not let the default be Int.MaxValue? I'm also ok with a default Int.MaxValue, if the special value zero is really something you are against. if that's what this is about, update the title to reflect it. => I will gladly do that, if you think the current title is misleading. This is a behavior change by default, so we should think carefully about it => Yes, I agree. What are the downsides – why would someone have ever made it 10? presumably, performance. => The changed code consist simply in an additionnal condition in an if. If you want to see a graph, I have one that test the differences in performances, but on my implementation optimised for single-item pattern. So it wouldn't be relevant, if you are worried of performance drop, I can do additional tests, on the two lines I changed. If you want me to use some particular dataset, I will also gladly oblige. Just say the word, and you will have them by tommorow. So it would be less about what impact it has on the performance, since it would be negligeable (again, i'm ready to prove that if you want me to), but about whether that feature seems needed or not. Which I agree, is debatable. Also, whichever senior implemented it that way, left this comment : // TODO: support unbounded pattern length when maxPatternLength = 0 Which you can find in the original code, and is the reason I created this Jira's thread first. Among the list of improvement I want to propose. You can find that line in the PrefixSpan code if you don't believe me.If theses change are rejected, then when I have the occasion, I will remove that line. So it would establish it isn't needed. You mention tests don't end and haven't established it's not due to your change. => I'm establishing that right now ... as I said. Also, they are ending, but they are really really slow. I don't think we can proceed with this in this state, right? => I will leave the decision to you was (Author: syrux): => Why not let the default be Int.MaxValue? I'm also ok with a default Int.MaxValue, if the special value zero is really something you are against. if that's what this is about, update the title to reflect it. => I will gladly do that, if you think the current title is misleading. This is a behavior change by default, so we should think carefully about it => Yes, I agree. What are the downsides – why would someone have ever made it 10? presumably, performance. => The changed code consist simply in an additionnal condition in an if. If you want to see a graph, I have one that test the differences in performances, but on my implementation optimised for single-item pattern. So it wouldn't be relevant, if you are worried of performance drop, I can do additional tests, on the two lines I changed. If you want me to use some particular dataset, I will also gladly oblige. Just say the word, and you will have them by tommorow. So it would be less about what impact it has on the performance, since it would be negligeable (again, i'm ready to prove that if you want me to), but about whether that feature seems needed or not. Which I agree, is debatable. Also, whichever senior implemented it that way, left this comment : // TODO: support unbounded pattern length when maxPatternLength = 0 Which you can find in the original code, and is the reason I created this Jira's thread first. Among the list of improvement I want to propose. You can find that line in the PrefixSpan code if you don't believe me.If theses change are rejected, then when I have the occasion, I will remove that line. So it would establish it isn't needed. You mention tests don't end and haven't established it's not due to your change. => I'm establishing that right now ... as I said. I don't think we can proceed with this in this state, right? => I will leave the decision to you > Add a special value for unlimited max pattern length in Prefix span, and set > it as default. > --- > > Key: SPARK-20180 > URL: https://issues.apache.org/jira/browse/SPARK-20180 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Minor > Original Estimate: 0h > Remaining Estimate: 0h > > Right now, we need to use .setMaxPatternLength() method to > specify is the maximum pattern length of a sequence. Any pattern longer than > that won't be outputted. > The current default maxPatternlength value being 10. > This should be changed so that with input 0, all patt