date:20101216

[
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972057#action_12972057
]

Michael McCandless commented on LUCENE-2815:

Ugh, nice finds Yonik! We should fix these.

Maybe MultiFields should just pre-build its MapString,Term on init?

You're right, we do reuse MultiFields today (we stuff the instance of
MultiFields onto the IndexReader with IndexReader.store/retrieveFields), but I
wonder whether we really should? (In fact I thought at one point we decided to
stop doing that... yet, we still are... can't remember the details; maybe perf
hit was too high eg for MTQs/Solr facets/etc.).

What do we need to do to make the publication safe? Is making
IR.store/retrieveFields sync'd sufficient?

Aside: Java concurrency is a *mess*. I understand why JMM is needed, to get
good perf on modern CPUs, but allowing the low level CPU cache coherency
requirements to bubble all the way up to complex requirements in the language
itself, is a disaster.

MultiFields not thread safe
---

Key: LUCENE-2815
URL: https://issues.apache.org/jira/browse/LUCENE-2815
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley

MultiFields looks like it has thread safety issues

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Earwin Burrfoot updated LUCENE-2814:

Attachment: LUCENE-2814.patch

First iteration.

Passes all tests except TestNRTThreads. Something to do with numDocsInStore and
numDocsInRam merged together?
Lots of non-critical nocommits (just markers for places I'd like to recheck).
DW.docStoreEnabled and *.closeDocStore() have to go, before committing

stop writing shared doc stores across segments
--

Key: LUCENE-2814
URL: https://issues.apache.org/jira/browse/LUCENE-2814
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
Attachments: LUCENE-2814.patch

Shared doc stores enables the files for stored fields and term vectors to be
shared across multiple segments. We've had this optimization since 2.1 I
think.
It works best against a new index, where you open an IW, add lots of docs,
and then close it. In that case all of the written segments will reference
slices a single shared doc store segment.
This was a good optimization because it means we never need to merge these
files. But, when you open another IW on that index, it writes a new set of
doc stores, and then whenever merges take place across doc stores, they must
now be merged.
However, since we switched to shared doc stores, there have been two
optimizations for merging the stores. First, we now bulk-copy the bytes in
these files if the field name/number assignment is congruent. Second, we
now force congruent field name/number mapping in IndexWriter. This means
this optimization is much less potent than it used to be.
Furthermore, the optimization adds *a lot* of hair to
IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over
time, and causes odd behavior like a merge possibly forcing a flush when it
starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent
flushing, we can no longer share doc stores.
So, I think we should turn off the write-side of shared doc stores to pave
the path for DWPT to land on trunk and simplify IW/DW. We still must support
reading them (until 5.0), but the read side is far less hairy.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2259) Improve analyzer/version handling in Solr

[
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-2259:
--

Attachment: SOLR-2259part2.patch

here is a patch for branch_3x for part 2.

it warns if you are missing the luceneMatchVersion param in your config,
informing you that its emulating Lucene 2.4 and that this emulation is
deprecated,
and that this parameter will be mandatory in 4.0

Improve analyzer/version handling in Solr
-

Key: SOLR-2259
URL: https://issues.apache.org/jira/browse/SOLR-2259
Project: Solr
Issue Type: Task
Reporter: Robert Muir
Fix For: 3.1, 4.0

Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259part2.patch

We added Version for backwards compatibility support in Lucene.
We use this to fire deprecated code to emulate old version to ensure index
backwards compat.
Related: we deprecate old analysis components and eventually remove them.
To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere,
with the example having the latest.
if you don't specify a version in your solrconfig, it defaults to 2.4 though.
However, as of LUCENE-2781 2.4 is removed: but users with old configs that
don't specify a version should not be silently upgraded to the Version 3.0
emulation... this is bad.
Additionally, when users are using deprecated emulation or using deprecated
factories they might not know it, and it might come as a surprise if they
upgrade, especially if they arent looking at java apis or java code.
I propose:
# in trunk: we make the solrconfig luceneMatchVersion mandatory.
This is simple: Uwe already has a method that will error out if its not
present, we just use that.
# in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig:
telling you that its going to be required in 4.0 and that you are defaulting
to 2.4 emulation.
For example: Warning: luceneMatchVersion is not specified in solrconfig.xml.
Defaulting to 2.4 emulation. You should at some point declare and reindex to
at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed
in 4.0. This parameter will be mandatory in 4.0.
# in 3.x,trunk: we warn if you are using a deprecated matchVersion constant
somewhere in general, even for a specific tokenizer, telling you that you
need to at some point reindex with a current version before you can move to
the next release.
For example: Warning: you are using 2.4 emulation, at some point you need to
bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x
and will be removed in 4.0
# in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so
that you know its going to be removed.
For example: Warning: the ISOLatin1FilterFactory is deprecated and will be
removed in the next release. You should migrate to ASCIIFoldingFilterFactory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2288) clean up compiler warnings


 [ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2288:
--

Attachment: SOLR-2288_namedlist.patch

Hi Hoss Man, thanks for starting this issue.

I looked at your patch, and personally I think NamedList should really be 
type-safe.
If users want to use it in a type-unsafe way, thats fine, but the container 
itself shouldn't be ListObject.

Here's an initial patch (all tests pass)... it also removes the deprecated 
methods.


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2815) MultiFields not thread safe

[
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972096#action_12972096
]

Yonik Seeley commented on LUCENE-2815:
--

bq. but I wonder whether we really should? (In fact I thought at one point we
decided to stop doing that... yet, we still are... can't remember the details;
maybe perf hit was too high eg for MTQs/Solr facets/etc.).

It wouldn't be solr facets... that code asks for fields() once up front (per
facet request) and the rest of the work will dwarf that.
I think there probably are a lot of random places that use it where the
overhead could be significant. For example IndexReader.deleteDocuments(),
ParallelReader, FuzzyLikeThisQuery, and anyone else that uses any of the static
methods on Field on a non-segment reader.

bq. What do we need to do to make the publication safe? Is making
IR.store/retrieveFields sync'd sufficient?

More than sufficient. A volatile would also work fine provided that a race
shouldn't matter (i.e. more than one MultiFields object could be constructed).

bq. Maybe MultiFields should just pre-build its MapString,Term on init?

Ouch... those folks with 1000s of fields wouldn't be happy about that.

MultiFields not thread safe
---

Key: LUCENE-2815
URL: https://issues.apache.org/jira/browse/LUCENE-2815
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley

MultiFields looks like it has thread safety issues

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972107#action_12972107
 ] 

Hoss Man commented on SOLR-2288:


Robert: as mentioned, i'm trying to keep a narrow focus on this issue: dealing 
with warnings that can be cleaned up w/o changing functionality...

bq. The goal of this issue should not be to change any functionality or APIs, 
just deal with each warning

...can we please confine discusions of changing the implementation of NamedList 
(or any other classes) to distinct issues?  like SOLR-912?


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2288) clean up compiler warnings


[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972108#action_12972108
 ] 

Robert Muir commented on SOLR-2288:
---

bq. Robert: as mentioned, i'm trying to keep a narrow focus on this issue: 
dealing with warnings that can be cleaned up w/o changing functionality...

Ok but i didnt change the functionality? the functionality is the same, just 
the implementation is different.

This is the root cause of most of the compiler warnings, let's not dodge the 
issue.




 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972114#action_12972114
 ] 

Hoss Man commented on SOLR-2288:


bq. just the implementation is different.

fair enough -- i ment i was trying to avoid changes to either the APIs or the 
internals, just focusing on the quick wins that were easy to review at a glance 
and shouldn't affect the bytecode (CollectionObject instead of Collection; 
etc...)

I don't expect that *all* compiler warnings can be dealt with using trivial 
patches, but that's what i was trying to focus on in this issue.

changes to the internals of specific classes seem like they should be tracked 
in distinct issues with more visibility

 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972116#action_12972116
 ] 

Ryan McKinley commented on SOLR-2288:
-

For compiler warnings... without chaning the API, can we just use:  
NamedList?  rather then bind it explicitly to Object?

 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

[
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972119#action_12972119
]

Yonik Seeley commented on LUCENE-2723:
--

I tested the optimized index with mike's latest patches (since that's per
segment on both branch and trunk). Things are much more in line now... with
the branch being anywhere from 2.3% to 5.4% slower, depending on the exact
field tested.

Speed up Lucene's low level bulk postings read API
--

Key: LUCENE-2723
URL: https://issues.apache.org/jira/browse/LUCENE-2723
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.0

Attachments: LUCENE-2723-termscorer.patch,
LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch,
LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch,
LUCENE-2723.patch, LUCENE-2723_termscorer.patch

Spinoff from LUCENE-1410.
The flex DocsEnum has a simple bulk-read API that reads the next chunk
of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
(from LUCENE-1410). This is not unlike sucking coffee through those
tiny plastic coffee stirrers they hand out airplanes that,
surprisingly, also happen to function as a straw.
As a result we see no perf gain from using FOR/PFOR.
I had hacked up a fix for this, described at in my blog post at
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
I'm opening this issue to get that work to a committable point.
So... I've worked out a new bulk-read API to address performance
bottleneck. It has some big changes over the current bulk-read API:
* You can now also bulk-read positions (but not payloads), but, I
have yet to cutover positional queries.
* The buffer contains doc deltas, not absolute values, for docIDs
and positions (freqs are absolute).
* Deleted docs are not filtered out.
* The doc freq buffers need not be aligned. For fixed intblock
codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
Group varint, etc.) they won't be.
It's still a work in progress...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2288) clean up compiler warnings


[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972121#action_12972121
 ] 

Robert Muir commented on SOLR-2288:
---

Separately, i just want to say the following about NamedList:

All uses of this API should really be reviewed. I'm quite aware that it warns 
you about the fact that its slow for certain operations,
but in my opinion these slow operations such as get(String, int) should be 
deprecated and removed.

Any users that are using NamedList in this way, especially in loops, are very 
likely using the wrong datastructure.


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-16 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2694:


Attachment: LUCENE-2694.patch

Attaching current state - all test pass for me and luceneutils brings 
consistent results with trunk.

{code}

   Query   QPS trunkQPS termstate  Pct diff
unit~2.0   14.70   14.39 -2.1%
  united~2.06.916.83 -1.1%
  united~1.07.427.38 -0.6%
unit state   12.31   12.37  0.5%
unit~1.0   15.41   15.49  0.5%
uni*7.187.22  0.6%
un*d7.978.04  0.9%
   unit*   12.89   13.09  1.6%
+unit +state   28.16   28.64  1.7%
+nebraska +state   81.26   82.67  1.7%
spanNear([unit, state], 10, true)   11.60   11.83  2.0%
   state   40.50   41.47  2.4%
  spanFirst(unit, 5)   47.65   48.84  2.5%
  unit state   17.72   18.19  2.7%
 u*d4.274.48  5.0%
{code}
those are the results I have for now Fuzzy only expands to 50 terms so that 
might no be very meaningful. I re-added the TermCache for this patch though... 
Will attach more info tomorrow.

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass


[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972127#action_12972127
 ] 

Robert Muir commented on LUCENE-2694:
-

We shouldn't lose the clone() optimization in StandardPostingsReader... 
the class is final so it should use 'copy' instead of calling super.clone()
This is important for -client.


 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Example documents and geospatial

2010-12-16 Thread Erick Erickson

The example docs we distribute have a bunch of stores that have the exact
same location. That lead to some head scratching about why changing the
distance in the example queries seemed to make no difference in the number
of returned results then all of a sudden it reduced the number of hits
drastically.

Any objections to a patch that adds an arbitrary distance (say 1/4 mile or
so) to all of the stores in the example docs that have the same location?

If not, I'll put up a JIRA and attach a patch.

Erick

[jira] Commented: (SOLR-2259) Improve analyzer/version handling in Solr

[
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972150#action_12972150
]

Robert Muir commented on SOLR-2259:
---

I committed part 2 in revision 1050064.

Improve analyzer/version handling in Solr
-

Key: SOLR-2259
URL: https://issues.apache.org/jira/browse/SOLR-2259
Project: Solr
Issue Type: Task
Reporter: Robert Muir
Fix For: 3.1, 4.0

Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259part2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2815) MultiFields not thread safe


 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2815:
---

Fix Version/s: 4.0

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972173#action_12972173
]

Michael McCandless commented on LUCENE-2814:

OK I dug here... the reason why TestNRTThreads fails is because you moved the
numDocsInRAM++ out of DW.getThreadState into WaitQueue.writeDocument.

When we buffer del terms in DW.deleteTerm/Terms/Query/Queries, we grab the
current numDocsInRAM as the docID upto, to record when it comes time to apply
the delete which docID we must stop at.

But with your change, this value is now an undercount, since numDocsInRAM is
now acting like numDocsInStore.

One way to fix this would be to change the delete methods to use nextDocID
instead of numDocsInRAM?

But I think I'd prefer to put back numDocsInRAM++ in getThreadState...

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-12-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972176#action_12972176
 ] 

David Smiley commented on LUCENE-2611:
--

It turns out that IntelliJ was rewriting my $MODULE_DIR$/../../   paths to 
paths relative to a path variable I defined on my system, and that is intended 
behavior according to JetBrains.  I removed the path variable... I can live 
without it after all, and that problem doesn't exist anymore.  

 IntelliJ IDEA setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
 The attached patch adds a new top level directory {{dev-tools/}} with sub-dir 
 {{idea/}} containing basic setup files for trunk, as well as a top-level ant 
 target named idea that copies these files into the proper locations.  This 
 arrangement avoids the messiness attendant to in-place project configuration 
 files directly checked into source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit test run per module is 
 included.
 Once {{ant idea}} has been run, the only configuration that must be performed 
 manually is configuring the project-level JDK.
 If this patch is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination module files (*.iml) in each 
 module's directory.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Example documents and geospatial

2010-12-16 Thread Grant Ingersoll

I don't think they are all the same, at least not in trunk.  I believe there 
are a few near San Fran, some near Buffalo, MN (my hometown ;-) ), and some in 
Oklahoma.  You can see this when you hit the /browse url.


On Dec 16, 2010, at 11:32 AM, Erick Erickson wrote:

 The example docs we distribute have a bunch of stores that have the exact 
 same location. That lead to some head scratching about why changing the 
 distance in the example queries seemed to make no difference in the number of 
 returned results then all of a sudden it reduced the number of hits 
 drastically.
 
 Any objections to a patch that adds an arbitrary distance (say 1/4 mile or 
 so) to all of the stores in the example docs that have the same location?
 
 If not, I'll put up a JIRA and attach a patch.
 
 Erick



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Example documents and geospatial

2010-12-16 Thread Yonik Seeley

On Thu, Dec 16, 2010 at 11:32 AM, Erick Erickson
erickerick...@gmail.com wrote:
 The example docs we distribute have a bunch of stores that have the exact
 same location. That lead to some head scratching about why changing the
 distance in the example queries seemed to make no difference in the number
 of returned results then all of a sudden it reduced the number of hits
 drastically.
 Any objections to a patch that adds an arbitrary distance (say 1/4 mile or
 so) to all of the stores in the example docs that have the same location?
 If not, I'll put up a JIRA and attach a patch.
 Erick

+1
Try not to put 'em in a lake or something though ;-)

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1410) PFOR implementation

2010-12-16 Thread hao yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-1410:


Attachment: LUCENE-1410.patch

This patch is to add codec support for PForDelta compression algorithms.


Changes by Hao Yan (hyan2...@gmail.com)

In summary, I added five files to support and test the codec.

In Src,
1.  org.apache.lucene.index.codecs.pfordelta.PForDelta.java
2.  org.apache.lucene.index.codecs.pfordelta.Simple16.java
3.  org.apache.lucene.index.codecs.PForDeltaFixedBlockCodec.java
4.  
org.apache.lucene.index.codecs.intblock.FixedIntBlockIndexOutputWithGetElementNum.java

In Test,
5.  
org.apache.lucene.index.codecs.intblock.TestPForDeltaFixedIntBLockCodec.java

1)  In particular, the firs class PForDelta is the core implementation
of PForDelta algorithm, which compresses exceptions using Simple16
that is implemented in the second class Simple16.
2)  The third classs PForDeltaFixedBlockCodec is similar to
org.apache.lucene.index.codesc.ockintblock.MockFixedIntBlockCodec in
Test, except that it uses PForDelta to encode the data in the buffer.
3)  The fourth class is almost the same as
org.apache.lucene.index.codecs.intblock.FixedIntBlockINdexOuput,
except that it provides an additional public function to retrieve the
value of the upto field, which is private filed in
FixedIntBlockINdexOuput. The reason I added this public function is
that the number of elements in the block that have meaningful values is not 
always equal to the blockSize or the buffer
size since the last block/buffer of a stream of data usually only
contain less number of data. In the case, I will fill all elements after the 
meaningful elements with 0s. Thus, we alwasy compress one entire block.

4)  The last class is the unit test to test PForDeltaFixedIntBlockCodec
which is very similar to
org.apache.lucene.index.codecs.mintblock.TestIntBlockCodec.

I also changed the LuceneTestCase class to add the new
PForDeltaFixeIntBlockCOde.

The unit tests and all lucence tests have passed.


 PFOR implementation
 ---

 Key: LUCENE-1410
 URL: https://issues.apache.org/jira/browse/LUCENE-1410
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Paul Elschot
Priority: Minor
 Fix For: Bulk Postings branch

 Attachments: autogen.tgz, for-summary.txt, 
 LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
 LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
 LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
 TestPFor2.java, TestPFor2.java

   Original Estimate: 21840h
  Remaining Estimate: 21840h

 Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972227#action_12972227
]

Michael Busch commented on LUCENE-2814:
---

The shared doc stores are actually already completely removed in the realtime
branch (part of LUCENE-2324).

Does someone want to help with the merge, then we can land the realtime branch
(which is pretty much only DWPT and removing doc stores) in trunk sometime soon?

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2815) MultiFields not thread safe


[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972258#action_12972258
 ] 

Yonik Seeley commented on LUCENE-2815:
--

bq. It looks like MultiReaderBits also has issues with safe object publication. 

Actually, it looks like this one is OK with most of our current code.
SegmentReader.getDeletedDocs() returns an object stored in a volatile, so that 
counts as a safe publish.  Other implementations seem to either throw an 
exception or directly call a segment reader.  One exception is instantiated 
index (I think).

We can't call getDeletedDocs() just once up-front, because an IndexReader may 
still be used to delete documents.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972275#action_12972275
]

Michael Busch commented on LUCENE-2814:
---

Well I need to merge with the recent changes in trunk (especially LUCENE-2680).
The merge is pretty hard, but I'm planning to spend most of my weekend on it.

If I can get most tests to pass again (most were passing before the merge),
then I think the only outstanding thing is LUCENE-2573 before we could land it
in trunk.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2815) MultiFields not thread safe


[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972281#action_12972281
 ] 

Yonik Seeley commented on LUCENE-2815:
--

I was going to fix InstantiatedIndex, but while I was in there, I saw a lot of 
non-threadsafe code.  I think that really deserves it's own issue.
What range of docs is InstantiatedIndex faster for, and is it something we want 
to continue to maintain?

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972286#action_12972286
]

Michael McCandless commented on LUCENE-2814:

I think taking things one step at a time would be good here?

Ie remove doc stores from trunk, let that bake on trunk for a while, then merge
to RT? So that what then remains on RT is DWPT / tiered flushing? Else RT is
a monster change?

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Example documents and geospatial

2010-12-16 Thread Erick Erickson

Grant:

Yep, there were fewer than I remembered, but still half a dozen or so ...
but I remember way more than that (somehow 16 comes to mind)... so obviously
some gnome has been in there already... Not all of the ones I remember were
the same at all, but enough were that it was puzzling.

Erik:
Lake? Why should I care about a lake? Actually, I never even thought about
it, glad you pointed it out..

OK, I'll put up a patch today or tomorrow. Anybody want to apply the patch
for 2275 (whitespace in mm parameter causes parse exception)?

Erick


On Thu, Dec 16, 2010 at 4:37 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Dec 16, 2010 at 11:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  The example docs we distribute have a bunch of stores that have the exact
  same location. That lead to some head scratching about why changing the
  distance in the example queries seemed to make no difference in the
 number
  of returned results then all of a sudden it reduced the number of hits
  drastically.
  Any objections to a patch that adds an arbitrary distance (say 1/4 mile
 or
  so) to all of the stores in the example docs that have the same location?
  If not, I'll put up a JIRA and attach a patch.
  Erick

 +1
 Try not to put 'em in a lake or something though ;-)

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972288#action_12972288
]

Michael Busch commented on LUCENE-2814:
---

bq. I think taking things one step at a time would be good here?

Probably still a smaller change than flex indexing ;)

But yeah in general I agree that we should do things more incrementally. I
think that's a mistake I've made with the RT branch so far. In this particular
case it's just a bit sad to redo all this work now, because I think I got the
removal of doc stores right in RT and all related tests to pass.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972298#action_12972298
]

Earwin Burrfoot commented on LUCENE-2814:
-

So, what's the plan?

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] [Take 3] Release PyLucene 2.9.4-1 and 3.0.3-1

2010-12-16 Thread Andi Vajda



On Sun, 12 Dec 2010, Andi Vajda wrote:


A patch that improves the finding of jni.h on Mac OS X was integrated.
It made it worth blocking this release and preparing new release artifacts.
No one voted on the [Take 2] artifacts and I hope this is not inconveniencing 
anyone.


I also hope that this is it for PyLucene 2.9.4/3.0.3 :-)

So please vote to release the artifacts available from
  http://people.apache.org/~vajda/staging_area/
as PyLucene 2.9.4 and PyLucene 3.0.3.

Here is my +1


This now has passed.
Thank you to all who voted !

The releases should be announced shortly.

Andi..

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972302#action_12972302
]

Michael Busch commented on LUCENE-2814:
---

bq. So, what's the plan?

I can't really work on this much before Saturday. But during the weekend I can
work on the RT merge and maybe try to pull out the docstore removal changes and
create a separate patch. Have to see how hard that is. If it's not too
difficult I'll post a separate patch, otherwise I'll commit the merge to RT and
maybe convince you guys to help a bit with getting the RT branch ready for
landing in trunk? :)

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972316#action_12972316
]

Earwin Burrfoot commented on LUCENE-2814:
-

Instead of you pulling out docstore removal, I can finish that patch. But then
merging's gonna be even greater bitch. Probably. But maybe not.
Do you do IRC? It can be faster to discuss in realtime, and you could also tell
what help you need with the branch.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2815) MultiFields not thread safe


 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2815:
-

Attachment: LUCENE-2815.patch

Here's a patch that uses a ConcurrentHashMap for the Terms cache, and makes 
IndexReader.fields volatile.

That IndexReader.fields variable is just the type of stuff that could just be 
stored in a generic cache on the IndexReader, if/when we get something like 
that.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2815.patch


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments