Hi,
I have a few questions regarding more like this:
1. In MoreLikeThis, it seems like the check for fieldNames being null and
fetching them from the reader is not done for all the like methods. For
example, it does not look like it is done at all for like(Reader r), and on
the other hand
you please use the Lucene code format? (Eclipse/IntelliJ
templates are at the bottom of
http://wiki.apache.org/lucene-java/HowToContribute )
Extension to MoreLikeThis to use tag information
Key: LUCENE-1910
URL
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: (was: LUCENE-1910.patch)
Extension to MoreLikeThis to use tag information
, the time taken to generate a MoreLikeThisUsingTags query is
constant.
Thanks,
Thomas
Extension to MoreLikeThis to use tag information
Key: LUCENE-1910
URL: https://issues.apache.org/jira/browse/LUCENE-1910
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: LUCENE-1910.patch
Extension to MoreLikeThis to use tag information
[
https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1993:
--
Assignee: Michael McCandless
MoreLikeThis - allow to exclude terms
shortly.
MoreLikeThis - allow to exclude terms that appear in too many documents
(patch included)
Key: LUCENE-1993
URL: https://issues.apache.org/jira/browse/LUCENE-1993
[
https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1993.
Resolution: Fixed
Fix Version/s: 3.0
Thanks Christian!
MoreLikeThis
MoreLikeThis - allow to exclude terms that appear in too many documents
(patch included)
Key: LUCENE-1993
URL: https://issues.apache.org/jira/browse/LUCENE-1993
MoreLikeThis - allow to exclude terms that appear in too many documents (patch
included)
Key: LUCENE-1993
URL: https://issues.apache.org/jira/browse/LUCENE-1993
documents?
Unfortunately, I can't see this being generally useful until the performance is
improved dramatically.
Extension to MoreLikeThis to use tag information
Key: LUCENE-1910
URL: https://issues.apache.org
document terms for a given are cached in a hashmap once they have been
generated in order to speed up subsequent lookups.
Extension to MoreLikeThis to use tag information
Key: LUCENE-1910
URL: https
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: (was: LUCENE-1910.patch)
Extension to MoreLikeThis to use tag information
a lot of searches.
I need to spend a little more time looking at it before I understand it in more
detail.
Before then - have you tested this on a big (millions of docs/terms) index?
Some performance figures would be useful to accompany this.
Cheers,
Mark
Extension to MoreLikeThis to use tag
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Priority: Minor (was: Major)
Extension to MoreLikeThis to use tag information
Extension to MoreLikeThis to use tag information
Key: LUCENE-1910
URL: https://issues.apache.org/jira/browse/LUCENE-1910
Project: Lucene - Java
Issue Type: New Feature
Components
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: LUCENE-1910.patch
Extension to MoreLikeThis to use tag information
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: (was: LUCENE-1910.patch)
Extension to MoreLikeThis to use tag information
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas D'Silva updated LUCENE-1910:
---
Attachment: LUCENE-1910.patch
Extension to MoreLikeThis to use tag information
Hi,
I would like to contribute a class based on the MoreLikeThis class in
contrib/queries that generates a query based on the tags associated
with a document. The class assumes that documents are tagged with a
set of tags (which are stored in the index in a seperate Field). The
class determines
is in the
IndexReader, but suspect that would be a can of worms. Comments?
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse
I'm confused: how come you are not already seeing the benefits of this
cache? You ought to see MLT queries going faster. This core cache was first
added in 2.4.x; it looks like you were testing against 2.4.1 (from the Affects
Version on this issue).
Morelikethis queries are very slow
testing against 2.4.1 (from the
Affects Version on this issue).
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE
On Thu, Jul 30, 2009 at 6:28 AM, Richard Marrrichard.m...@gmail.com wrote:
Yeah, having this stuff stored centrally behind the IndexReader seems
like a better idea than having it in client classes. My shallow
knowledge of the code isn't helping me explain why it's not performing
though.
Out
2009/7/30 Michael McCandless luc...@mikemccandless.com:
Good question...
Good answer. Thanks.
I guess the next step then is to understand why the TermInfo cache
isn't getting the performance to where it could be. It'll take me a
while to get to the point where I can answer that question. If
that is not MLT related.
A lot of MLTs use the same terms, and I have a good size cache for it, meaning
most terms I use in MLT can be retrieved from there. Seeing as MLT in my
circumstance is one of the slower bits, this can give me a good advantage.
Morelikethis queries are very slow compared to other
On 7/30/09 4:10 AM, Michael McCandless wrote:
Plus, the original motivation for this (LUCENE-1195) was because
queries in general look up the same term at least 2 times during their
execution (weight (idf computation), get postings), and so I think we
wanted to ensure that a single thread doing
noticed. Please ignore the latest patch.
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE-1690
Project
some feedback in the meantime?
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE-1690
Project: Lucene
like it'll incorrectly put 0 into the cache,
when the field was in the top-level cache but the term text wasn't in the 2nd
level cache?
Morelikethis queries are very slow compared to other search types
-
Key
[
https://issues.apache.org/jira/browse/LUCENE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch closed LUCENE-1697.
-
Resolution: Duplicate
This will be fixed as part of LUCENE-1460.
MoreLikeThis should use
and this is unbounded. Perhaps this
should be an LRU cache with a settable maximum number of entries to stop it
growing forever if you do a lot of like this queries on large indexes with many
unique terms.
Otherwise nice addition, has sped up my more like this queries a bit.
Morelikethis queries
binding to a specific IndexReader
instance. I think I can handle that.
Carl, do you have any data on how this has changed performance in your system?
My use case is a limited vocabulary so the performance gain was large.
Morelikethis queries are very slow compared to other search types
.
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE-1690
Project: Lucene - Java
Issue Type
[
https://issues.apache.org/jira/browse/LUCENE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1272.
Resolution: Fixed
Thanks Jonathan!
Support for boost factor in MoreLikeThis
MoreLikeThis should use the new Token API
-
Key: LUCENE-1697
URL: https://issues.apache.org/jira/browse/LUCENE-1697
Project: Lucene - Java
Issue Type: Improvement
Reporter: Grant Ingersoll
don't want this one Grant, we
should assign to Michael as this is a part of LUCENE-1460.
MoreLikeThis should use the new Token API
-
Key: LUCENE-1697
URL: https://issues.apache.org/jira/browse/LUCENE-1697
a little longer for me to do. I'll have
a think about it.
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE-1690
. It shouldn't affect
any applications that don't opt-in to using it, and applications that do should
see an order of magnitude performance improvement for MLT queries.
This cache implementation is tied to the MLT object but can be cleared on
demand.
Morelikethis queries are very slow compared
include the IndexReader in the cache key? Then it'd be functionally
equivalent we could enable it by default?
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL
Morelikethis queries are very slow compared to other search types
-
Key: LUCENE-1690
URL: https://issues.apache.org/jira/browse/LUCENE-1690
Project: Lucene - Java
Issue Type
in MoreLikeThis
Key: LUCENE-1272
URL: https://issues.apache.org/jira/browse/LUCENE-1272
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Jonathan
for boost factor in MoreLikeThis
Key: LUCENE-1272
URL: https://issues.apache.org/jira/browse/LUCENE-1272
Project: Lucene - Java
Issue Type: New Feature
Components: contrib
for you to update this patch to work with the
trunk, so I can apply it? Thanks!
Support for boost factor in MoreLikeThis
Key: LUCENE-1272
URL: https://issues.apache.org/jira/browse/LUCENE-1272
Project: Lucene
, New])
Actually, my copy of MLT already takes Similarity in ctor and has
set/getSimilarity, so no patch is needed. You want/need that isNoise method
protected?
Let users set Similarity for MoreLikeThis
-
Key: LUCENE-896
] (was: [Patch Available, New])
Fix Version/s: 2.9
Assignee: Otis Gospodnetic
I don't see any harm in this, I'll make the change later this week.
Support for boost factor in MoreLikeThis
Key: LUCENE-1272
URL: https
.
MoreLikeThis ignores custom similarity
--
Key: LUCENE-1298
URL: https://issues.apache.org/jira/browse/LUCENE-1298
Project: Lucene - Java
Issue Type: Bug
Reporter: Grant Ingersoll
MoreLikeThis ignores custom similarity
--
Key: LUCENE-1298
URL: https://issues.apache.org/jira/browse/LUCENE-1298
Project: Lucene - Java
Issue Type: Bug
Reporter: Grant Ingersoll
[
https://issues.apache.org/jira/browse/LUCENE-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-1298:
Attachment: LUCENE-1298.patch
Patch
MoreLikeThis ignores custom similarity
revision 662413.
Make retrieveTerms(int docNum) public in MoreLikeThis
-
Key: LUCENE-1295
URL: https://issues.apache.org/jira/browse/LUCENE-1295
Project: Lucene - Java
Issue Type
retrieveTerms(int docNum) public in MoreLikeThis
-
Key: LUCENE-1295
URL: https://issues.apache.org/jira/browse/LUCENE-1295
Project: Lucene - Java
Issue Type: Improvement
Components
?
{quote}
I see MLT is full of tabs, should you feel like fixing the formating.
{quote}
Yeah, I noticed that too, and it is quite egregious, but I thought we avoided
formatting changes, but I am happy to make an exception here.
Make retrieveTerms(int docNum) public in MoreLikeThis
Make retrieveTerms(int docNum) public in MoreLikeThis
-
Key: LUCENE-1295
URL: https://issues.apache.org/jira/browse/LUCENE-1295
Project: Lucene - Java
Issue Type: Improvement
docNum) public in MoreLikeThis
-
Key: LUCENE-1295
URL: https://issues.apache.org/jira/browse/LUCENE-1295
Project: Lucene - Java
Issue Type: Improvement
Components: contrib
, should you feel like fixing the
formating.
Make retrieveTerms(int docNum) public in MoreLikeThis
-
Key: LUCENE-1295
URL: https://issues.apache.org/jira/browse/LUCENE-1295
Project: Lucene - Java
Seems very reasonable. I'll commit on Monday.
Let users set Similarity for MoreLikeThis
-
Key: LUCENE-896
URL: https://issues.apache.org/jira/browse/LUCENE-896
Project: Lucene - Java
Issue Type
Support for boost factor in MoreLikeThis
Key: LUCENE-1272
URL: https://issues.apache.org/jira/browse/LUCENE-1272
Project: Lucene - Java
Issue Type: New Feature
Components: contrib
in MoreLikeThis
Key: LUCENE-1272
URL: https://issues.apache.org/jira/browse/LUCENE-1272
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Jonathan Leibiusky
Let users set Similarity for MoreLikeThis
-
Key: LUCENE-896
URL: https://issues.apache.org/jira/browse/LUCENE-896
Project: Lucene - Java
Issue Type: Improvement
Components: Other
for Similarity.
This also fixes a couple javadoc typos and makes isNoiseWord() protected
Let users set Similarity for MoreLikeThis
-
Key: LUCENE-896
URL: https://issues.apache.org/jira/browse/LUCENE-896
Project
Hi,
Lucene is completely new to me. I just downloaded 1.9.1 and started
experimenting with it. I am a bit confused though. I want to use the
MoreLikeThis class, which appears in the javadoc, but does not exist in code.
Where can I find it?
Dean
: Lucene is completely new to me. I just downloaded 1.9.1 and started
: experimenting with it. I am a bit confused though. I want to use the
: MoreLikeThis class, which appears in the javadoc, but does not exist in
: code. Where can I find it?
if you look at the way the main javadoc index
62 matches
Mail list logo