[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850204#action_12850204
 ] 

Grant Ingersoll commented on SOLR-1804:
---

We should be able to go through with this now, right?

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845474#action_12845474
 ] 

Robert Muir commented on SOLR-1804:
---

Thanks for the confirmation the clusters are ok.

Well, this is embarrassing, it turns out it is a backwards break, 
though documented, and the culprit is yours truly.

This is the reason it gets different results:
{noformat}
* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
  This means that terms with a position increment gap of zero do not
  affect the norms calculation by default.  (Robert Muir)
{noformat}

I'll change the test to expect 15 clusters with Lucene 3.1, thanks :)

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845462#action_12845462
 ] 

Stanislaw Osinski commented on SOLR-1804:
-

Yeah, the clusters look good. When you're done with upgrading Lucene to 3.x, we 
could also upgrade Carrot2 to version 3.2.0, which is LGPL-free and could be 
distributed together with Solr.

S.

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845459#action_12845459
 ] 

Stanislaw Osinski commented on SOLR-1804:
-

I was about to offer advice similar to Grant's, but wanted to wait to confirm 
the scope of changes.

If it was only Lucene dependency update, with the assumption that the update 
didn't change the documents fed to Carrot2 in tests, the results shouldn't 
change. Carrot2 uses Lucene interfaces internally, but the tokenizer is not the 
standard Lucene one; so no Version.LUCENE_* issues as far as I can tell.

I haven't got Solr code handy, but maybe the test performs clustering on 
summaries generated from the original test documents and Lucene 3.x introduces 
some changes in the way summaries are generated?

If the clusters look reasonable, the problem is probably not critical, but 
still worth investigation to make sure it's not a bug of some kind.

S.


> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845455#action_12845455
 ] 

Robert Muir commented on SOLR-1804:
---

Grant  I am concerned about a possible BW break in Lucene trunk, that is all.
I think its strange that 3.0 and 3.1 jars give different results.

Can you tell me if the clusters are reasonable? here is the output.

{noformat}
junit.framework.AssertionFailedError: number of clusters: [
{labels=[Data Mining Applications], docs=[5, 13, 25, 12, 27],clusters=[]}, 
{labels=[Databases],docs=[15, 21, 7, 17, 11],clusters=[]}, 
{labels=[Knowledge Discovery],docs=[6, 18, 15, 17, 10],clusters=[]}, 
{labels=[Statistical Data Mining],docs=[28, 24, 2, 14],clusters=[]}, 
{labels=[Data Mining Solutions],docs=[5, 22, 8],clusters=[]}, 
{labels=[Data Mining Techniques],docs=[12, 2, 14],clusters=[]}, 
{labels=[Known as Data Mining],docs=[23, 17, 19],clusters=[]}, 
{labels=[Text Mining],docs=[6, 9, 29],clusters=[]}, 
{labels=[Dedicated],docs=[10, 11],clusters=[]}, 
{labels=[Extraction of Hidden Predictive],docs=[3, 11],clusters=[]}, 
{labels=[Information from Large],docs=[3, 7],clusters=[]}, 
{labels=[Neural Networks],docs=[12, 1],clusters=[]}, 
{labels=[Open],docs=[15, 20],clusters=[]}, 
{labels=[Research],docs=[26, 8],clusters=[]}, 
{labels=[Other Topics],docs=[16],clusters=[]}
] expected:<16> but was:<15>
{noformat}

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845453#action_12845453
 ] 

Grant Ingersoll commented on SOLR-1804:
---

Robert, instead of tracking it down by brute force, you might just dump out the 
clusters and see if they are still reasonable.  If they are, I wouldn't worry 
too much about it, as it is likely due to the issues Staszek mentioned.

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845451#action_12845451
 ] 

Robert Muir commented on SOLR-1804:
---

Hi Stanislaw:

Correct, I did not upgrade anything else, just lucene. 

I'm sorry its not exactly related to this issue 
(although If we need to upgrade carrot2 to be compatible with Lucene 3.x, then 
thats ok)

My concern is more that we did something in Lucene between 3.0 
and now that caused the results to be different... though again
this could be explained if somewhere in its code Carrot2 uses some
Lucene analysis component, but doesn't hardwire Version to LUCENE_29.

If all else fails I can try to seek out the svn rev # of Lucene that causes 
this change,
by brute force binary search :)

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845441#action_12845441
 ] 

Stanislaw Osinski commented on SOLR-1804:
-

Hi Robert,

Lucene dependency is the only change, right? Or you also upgraded Carrot2 from 
e.g. 3.1 to 3.2? If the latter is the case, the number of cluster may have 
changed e.g. because we tuned stop words or other algorithm attributes.

S.



> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845301#action_12845301
 ] 

Robert Muir commented on SOLR-1804:
---

I wonder if you guys have any insight why the results of this test may have 
changed from 16 to 15 between Lucene 3.0 and Lucene 3.1-dev: 
http://svn.apache.org/viewvc?view=revision&revision=923048

It did not change between Lucene 2.9 and Lucene 3.0, so I'm concerned about why 
the results would change between 3.0 and 3.1-dev. 

One possible explanation would be if Carrot2 used Version.LUCENE_CURRENT 
somewhere in its code. Any ideas?

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.