[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: (was: 
solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png)

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Fix Version/s: 5.2.2

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: LUCENE-6687.patch

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Flags: Patch,Important
Lucene Fields: New,Patch Available  (was: New)

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
External issue URL:   (was: 
https://docs.google.com/a/sematext.com/document/d/1oPjxj9dpw-sT2NhVN-HuFmCE_ouyrPNdDQLnCgfiyq8/edit?usp=sharing)

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

I should probably mention that multiple fields ({{qf}}) work because I applied 
the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
ap

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

I should probably mention that multiple fields work because I applied the 
patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears onl

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):



Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):



Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verif

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: terms-how.png
terms-glass.png
terms-angry.png
terms-accumulator.png
solr-mlt-tf-doubling-bug.png
solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png
solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png
solr-mlt-tf-doubling-bug-results.png
buggy-method-usage.png

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
> Attachments: buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> Let's see what happens when we use {{mintf=15}}:
> Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, Trunk
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing

[jira] [Created] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)
Marko Bonaci created LUCENE-6687:


 Summary: MLT term frequency calculation bug
 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci


In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7143) MoreLikeThis Query Parser does not handle multiple field names

2015-07-02 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612610#comment-14612610
 ] 

Marko Bonaci commented on SOLR-7143:


Not sure how this ended in my private e-mail.

We were to suggest them to upgrade to 5.x so that, amongst other fixes and
improvements, they can use this new MLT QueryParser to solve problem with
non-functioning MLT handler in cloud mode (
https://issues.apache.org/jira/browse/SOLR-788), but now it seems that even
5 would have to be patched (if those patches work).
On Thu, Jul 2, 2015 at 11:59 PM Otis Gospodnetic (JIRA) 



> MoreLikeThis Query Parser does not handle multiple field names
> --
>
> Key: SOLR-7143
> URL: https://issues.apache.org/jira/browse/SOLR-7143
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.0
>Reporter: Jens Wille
>Assignee: Anshum Gupta
> Attachments: SOLR-7143.patch, SOLR-7143.patch, SOLR-7143.patch, 
> SOLR-7143.patch, SOLR-7143.patch
>
>
> The newly introduced MoreLikeThis Query Parser (SOLR-6248) does not return 
> any results when supplied with multiple fields in the {{qf}} parameter.
> To reproduce within the techproducts example, compare:
> {code}
> curl 
> 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name%7DMA147LL/A'
> curl 
> 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=features%7DMA147LL/A'
> curl 
> 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name,features%7DMA147LL/A'
> {code}
> The first two queries return 8 and 5 results, respectively. The third query 
> doesn't return any results (not even the matched document).
> In contrast, the MoreLikeThis Handler works as expected (accounting for the 
> default {{mintf}} and {{mindf}} values in SimpleMLTQParser):
> {code}
> curl 
> 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/A&mlt.fl=name&mlt.mintf=1&mlt.mindf=1'
> curl 
> 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/A&mlt.fl=features&mlt.mintf=1&mlt.mindf=1'
> curl 
> 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/A&mlt.fl=name,features&mlt.mintf=1&mlt.mindf=1'
> {code}
> After adding the following line to 
> {{example/techproducts/solr/techproducts/conf/solrconfig.xml}}:
> {code:language=XML}
> 
> {code}
> The first two queries return 7 and 4 results, respectively (excluding the 
> matched document). The third query returns 7 results, as one would expect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler

2015-03-30 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387224#comment-14387224
 ] 

Marko Bonaci commented on SOLR-2305:


Just to add some additional info (if someone stumbles here while searching for 
a solution).  
Since _Google Code_ is on its way out I transferred the repo to GitHub (source, 
drop-in jar and usage docs).  
Here: https://github.com/mbonaci/solr-data-import-scheduler

> DataImportScheduler
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2305-1.diff, patch.txt
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler

2012-10-22 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481621#comment-13481621
 ] 

Marko Bonaci commented on SOLR-2305:


[~otis]
Got it! Will do...

> DataImportScheduler
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler

2012-10-18 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479332#comment-13479332
 ] 

Marko Bonaci commented on SOLR-2305:


[~erickerickson]
Isn't the number of votes enough to push the issue? It's 14th from the top.

I assume that, before the patch gets commited someone more experienced in Solr 
source standards should check the code.
Is it time to change the Assignee to myself and Status to 'Ready To Review'?

Thanks

> DataImportScheduler
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2305) DataImportScheduler

2012-10-03 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468938#comment-13468938
 ] 

Marko Bonaci edited comment on SOLR-2305 at 10/4/12 9:54 AM:
-

[~newmanw10] Correct me if I'm wrong, but if I read the header of the issue 
correctly, it's planned to be included in 4.1 (affects version 4 which then 
becomes version 4.1).

  was (Author: mbonaci):
[~newmanw10] Correct me if I'm wrong, but if I read the header of the issue 
the correctly, it's planned to be included in 4.1 (affects version 4 which then 
becomes version 4.1).
  
> DataImportScheduler
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler

2012-10-03 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468938#comment-13468938
 ] 

Marko Bonaci commented on SOLR-2305:


[~newmanw10] Correct me if I'm wrong, but if I read the header of the issue the 
correctly, it's planned to be included in 4.1 (affects version 4 which then 
becomes version 4.1).

> DataImportScheduler
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2305) DataImportScheduler - Marko Bonaci

2012-10-03 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468935#comment-13468935
 ] 

Marko Bonaci edited comment on SOLR-2305 at 10/4/12 9:35 AM:
-

Can I please ask someone (whoever has the appropriate access level) to remove 
my name from the title of this issue.
I normally am an egocentric guy, but this, even for me, is slightly over the 
top :)

Thanks

  was (Author: mbonaci):
Can I please ask someone (whoever has the appropriate access level) to 
remove my name from the title of this issue.
I normally am egocentric guy, but this, even for me, is slightly too much, even 
for me :)

Thanks
  
> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2012-10-03 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468935#comment-13468935
 ] 

Marko Bonaci commented on SOLR-2305:


Can I please ask someone (whoever has the appropriate access level) to remove 
my name from the title of this issue.
I normally am egocentric guy, but this, even for me, is slightly too much, even 
for me :)

Thanks

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2012-08-15 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435343#comment-13435343
 ] 

Marko Bonaci commented on SOLR-2305:


Until DIH Scheduler is included in official Solr distribution you can use JAR 
file I published [here|http://code.google.com/p/solr-data-import-scheduler/], 
on Google code.

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0-ALPHA
>Reporter: Bill Bell
> Fix For: 4.1
>
> Attachments: patch.txt, SOLR-2305-1.diff
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-23 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053796#comment-13053796
 ] 

Marko Bonaci edited comment on SOLR-2305 at 6/23/11 11:17 AM:
--

This is patch for adding DIHScheduler v1.2 to Solr.
I didn't know I could make a patch concerning only 
org.apache.solr.handler.dataimport package :(
So finally, here it is.

Since I still have problems with build path/packages in Eclipse:
Wasn't tested at all.
No unit tests.
Whoever will be adding this please feel free to contact me if such a need 
arises.
Also, all criticism is more than welcome, I want to learn to do this the right 
way.

Thanks

  was (Author: mbonaci):
Patch for adding DIHScheduler v1.2 to Solr
  
> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
> Attachments: patch.txt
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-23 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated SOLR-2305:
---

Attachment: patch.txt

Patch for adding DIHScheduler v1.2 to Solr

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
> Attachments: patch.txt
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-20 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051929#comment-13051929
 ] 

Marko Bonaci commented on SOLR-2305:


Hi Bill,
I had difficulties with setting up the project in Eclipse, and although I have 
successfully done it in the end, I think that the patch file wont be usable 
(due to many build path changes I made)?

All you have to do to incorporate DIHScheduler is to follow the instructions I 
posted here:
http://wiki.apache.org/solr/DataImportHandler#Scheduling

If you run into any kind of problem feel free to post the question here and 
I'll try to respond promptly.

Thank you.

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-14 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049180#comment-13049180
 ] 

Marko Bonaci commented on SOLR-2305:


I'll attach the patch during the following weekend.

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-01-03 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976665#action_12976665
 ] 

Marko Bonaci commented on SOLR-2305:


I'd like to help, but you'll have to explain me how to do it.
What needs to be prepared? java files? the whole project?
Do I have to check out the project in Eclipse? And, if yes, how to get commit 
access rights?

I've never contributed before, obviously, but I'm interested in learning how to 
do it properly... Links?

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org